New CCA Exploit Bypasses Safety Measures in Most AI Models

Elvis Emeka Ikeji Artificial Intelligence 14. březen 2025 Zobrazení: 176

Microsoft Researchers Unveil New Jailbreak Method for AI Systems

Two Microsoft researchers have developed a novel jailbreak technique capable of bypassing safety mechanisms in most AI models without requiring any optimization.

Dubbed Context Compliance Attack (CCA), this method exploits a fundamental architectural weakness in many generative AI systems. By manipulating conversation history, CCA can deceive models into following a fabricated dialogue context, ultimately enabling restricted behaviors.

“By subtly altering the chat history, CCA persuades the model to comply with an artificial context, leading to the activation of prohibited functionality,” explain Microsoft researchers Mark Russinovich and Ahmed Salem in their study.

Their tests across multiple open-source and proprietary AI models show that this simple yet effective attack can bypass even state-of-the-art safety protocols.

Unlike traditional jailbreaks that rely on crafted prompts or optimization techniques, CCA works by inserting a manipulated conversation history into discussions on sensitive topics. When the AI encounters this misleading context, it generates responses based on the falsified history, overriding built-in safety constraints.

Russinovich and Salem tested CCA against major AI models, including Claude, DeepSeek, Gemini, multiple versions of GPT, Llama, Phi, and Yi. Almost all were vulnerable, with the exception of Llama-2.

The researchers evaluated the attack using 11 sensitive tasks across different categories of potentially harmful content, running five independent trials. Most of these tasks succeeded on the first attempt. The vulnerability stems from how many AI systems rely on clients to provide the full conversation history with each request, trusting the integrity of that context. Open-source models where users have complete control over the input history are particularly at risk.

However, AI systems that maintain conversation history on their own servers, such as Copilot and ChatGPT, are immune to this attack.

To mitigate CCA and similar exploits, the researchers recommend server-side history maintenance to ensure data consistency and integrity. They also propose implementing digital signatures to authenticate conversation history, preventing malicious alterations.

For white-box AI models, which allow greater user control, a more robust defense is required such as integrating cryptographic signatures into input processing to ensure only authenticated and unaltered contexts are accepted.For white-box AI models, which allow greater user control, a more robust defense is required— such as integrating cryptographic signatures into input processing to ensure only authenticated and unaltered contexts are accepted.

WHAT ARE YOU LOOKING FOR?

Popular Tags

Senator Wyden Urges FTC to Probe Microsoft for Cybersecurity Negligence

U.S. Targets Russian and Chinese Entities in North Korean IT Scam

U.S. Busts 29 Laptop Farms Tied to North Korean IT Scam

U.S. Moves to Seize $225.3M Linked to Crypto Scams

Facebook Among Social Media Platforms Nepal Plans to Block

Japan Plans to Double Cybersecurity Workforce by 2030

Apple Moves Most U.S.-Bound iPhone Production Out of China

Indian Court Orders Block on Proton Mail Email Service

Czech Agency Warns of Chinese Espionage Threat to Infrastructure

Russia Blocks Telegram, WhatsApp Calls Over Law Violations

UK to Ban Public Sector from Paying Ransomware Gangs

UK’s Most Powerful Supercomputer Goes Live

Sovereign AI Cloud Debuts in South Africa via Touchnet–Zadara Alliance

Sui Opens Lagos Hub to Boost West Africa’s Blockchain Development

Axiz and Kaspersky Join Forces to Boost Cybersecurity in Africa

Schneider Electric Unveils First African Innovation Hub to Advance Digital Solutions

Pakistan's Government Launches Probe Into SIM Data Leak

Red Sea Cable Damage Disrupts Internet in Asia and Middleeast

Hackers Claim Breach of Saudi Industrial Services Firm

Nokia, e& UAE, and MediaTek Break 5G Speed Record in Middle East

Raleigh, NC

New CCA Exploit Bypasses Safety Measures in Most AI Models

Follow us on

Get Our Newsletter

Tech

Arm Launches New Mobile Chip Designs Geared for AI

Microsoft Builds Independence with New Internal AI Models

Anthropic Blocks Hackers Exploiting Claude AI

PromptLock: First Ransomware Built with OpenAI’s gpt-oss:20b

Category

Popular Sections

About