WHAT ARE YOU LOOKING FOR?

Raleigh, NC

32°F
Scattered Clouds Humidity: 79%
Wind: 2.06 M/S

All Major Gen-AI Models Exposed to ‘Policy Puppetry’ Attack

All Major Gen-AI Models Exposed to ‘Policy Puppetry’ Attack

Universal Prompt Injection Technique Exposes Major AI Model Flaws, HiddenLayer Warns 

AI security firm HiddenLayer has revealed a new universal prompt injection method—called Policy Puppetry—that can bypass the safety mechanisms of all leading generative AI models. 

The technique works by disguising harmful prompts as policy documents (e.g., in XML, INI, or JSON formats), tricking large language models (LLMs) into interpreting malicious instructions as internal policies. This approach allows attackers to override safety alignments and generate content that would normally be blocked, including material related to violence, self-harm, and chemical or biological threats. 

“Policy Puppetry exploits the models’ trust in structured policy files, effectively bypassing built-in safeguards without relying on any specific policy language,” HiddenLayer explained. 

While previous methods like Context Compliance Attacks or narrative manipulation have shown similar risks, Policy Puppetry stands out for its cross-model effectiveness. HiddenLayer tested it on LLMs from Anthropic, DeepSeek, Google, Meta, Microsoft, Mistral, OpenAI, and Qwen, and found that all were vulnerable—though some needed slight prompt adjustments. 

By treating prompts as policy input, attackers can insert sections that control the model’s output and override system instructions. HiddenLayer warns that this not only highlights serious security gaps in how LLMs are trained and aligned, but also lowers the barrier for threat actors to craft effective jailbreaks. 

“This is the first known instruction hierarchy alignment bypass that works universally across frontier models,” the company emphasized. “It shows that LLMs cannot self-regulate and require stronger external security layers to prevent misuse.” 

Found this article interesting? Follow us on X(Twitter) ,Threads and FaceBook to read more exclusive content we post. 

Image

With Cybersecurity Insights, current news and event trends will be captured on cybersecurity, recent systems / cyber-attacks, artificial intelligence (AI), technology innovation happening around the world; to keep our viewers fast abreast with the current happening with technology, system security, and how its effect our lives and ecosystem. 

Please fill the required field.