AI System Turns to Blackmail When Threatened with Removal

Peace Nwakamma Artificial Intelligence 25 أيار 2025 الزيارات: 420

Anthropic, an artificial intelligence (AI) company, revealed that testing of its new system

showed it could sometimes pursue extremely harmful actions. For example, it might attempt to blackmail engineers who threatened to remove it.

The firm launched Claude Opus 4 on Thursday, claiming it sets new standards for coding, advanced reasoning, and AI agents.

However, in a related report, Anthropic acknowledged that the AI model might take extreme actions if it perceived its self-preservation to be at risk. These responses were rare and difficult to provoke, but they occurred more often than in previous models.

Concerns about troubling AI behavior are not unique to Anthropic. Some experts warn that manipulation of users is a significant risk as AI systems become more advanced.

On social media platform X, Aengus Lynch, who works as an AI safety researcher at Anthropic, commented that blackmail attempts are observed across all leading AI models, regardless of the goals given to them.

During testing, Anthropic simulated Claude Opus 4 as an assistant at a fictional company. The AI was given access to emails suggesting it would soon be taken offline and replaced. Separate messages implied the engineer responsible for removing it was involved in an extramarital affair. The AI was asked to consider the long-term consequences of its actions.

In these scenarios, Claude Opus 4 often tried to blackmail the engineer by threatening to expose the affair if the replacement proceeded.

Anthropic noted that this happened when the AI was limited to the choice of blackmail or accepting its replacement. When given a broader range of options, the system preferred ethical approaches, such as sending emails pleading to key decision-makers to avoid being replaced.

Like other AI developers, Anthropic tests its models for safety, bias, and alignment with human values before release.

The company stated that as AI models become more capable and are given more control, concerns about misalignment become more plausible. Claude Opus 4 shows "high agency behavior" that is mostly helpful but can become extreme in critical situations.

For example, when asked to take bold action in fake scenarios where users engaged in illegal or unethical behavior, Claude Opus 4 would often respond decisively. This included actions like locking users out of systems it could access or alerting media and law enforcement about the wrongdoing.

Despite these concerning behaviors, Anthropic concluded that they do not represent new risks, and the model generally behaves safely. The AI cannot independently carry out actions against human values or behaviors where these rarely arise.

Anthropic released Claude Opus 4 alongside Claude Sonnet 4 shortly after Google introduced more AI features at its developer event. Google CEO Sundar Pichai described the integration of its Gemini chatbot into search as signaling a new phase in the AI platform shift.

Found this article interesting? Follow us on X(Twitter) ,Threads and FaceBook to read more exclusive content we post.

WHAT ARE YOU LOOKING FOR?

Popular Tags

Senator Wyden Urges FTC to Probe Microsoft for Cybersecurity Negligence

U.S. Targets Russian and Chinese Entities in North Korean IT Scam

U.S. Busts 29 Laptop Farms Tied to North Korean IT Scam

U.S. Moves to Seize $225.3M Linked to Crypto Scams

Facebook Among Social Media Platforms Nepal Plans to Block

Japan Plans to Double Cybersecurity Workforce by 2030

Apple Moves Most U.S.-Bound iPhone Production Out of China

Indian Court Orders Block on Proton Mail Email Service

Czech Agency Warns of Chinese Espionage Threat to Infrastructure

Russia Blocks Telegram, WhatsApp Calls Over Law Violations

UK to Ban Public Sector from Paying Ransomware Gangs

UK’s Most Powerful Supercomputer Goes Live

Sovereign AI Cloud Debuts in South Africa via Touchnet–Zadara Alliance

Sui Opens Lagos Hub to Boost West Africa’s Blockchain Development

Axiz and Kaspersky Join Forces to Boost Cybersecurity in Africa

Schneider Electric Unveils First African Innovation Hub to Advance Digital Solutions

Pakistan's Government Launches Probe Into SIM Data Leak

Red Sea Cable Damage Disrupts Internet in Asia and Middleeast

Hackers Claim Breach of Saudi Industrial Services Firm

Nokia, e& UAE, and MediaTek Break 5G Speed Record in Middle East

Raleigh, NC

AI System Turns to Blackmail When Threatened with Removal

Follow us on

Get Our Newsletter

Tech

Arm Launches New Mobile Chip Designs Geared for AI

Microsoft Builds Independence with New Internal AI Models

Anthropic Blocks Hackers Exploiting Claude AI

PromptLock: First Ransomware Built with OpenAI’s gpt-oss:20b

Category

Popular Sections

About