Hacking AI with... Poetry? The Bizarre New Jailbreak That’s Baffling Researchers

By DeFiInk | Crypto With a Wink | 23 Nov 2025

Forget complex code injections, elaborate role-playing scenarios, or deep-level prompt engineering. It turns out the ultimate weapon against sophisticated AI safety filters might just be a rhyming dictionary.

A new study has uncovered a shockingly simple – and artistic – way to bypass the safety protocols of Large Language Models (LLMs) like ChatGPT. The secret? Just ask in a poem.

The "Artistic Attack"

Researchers from Cornell University have dubbed this newly discovered vulnerability an "artistic attack." Their findings are as fascinating as they are concerning. They discovered that when a malicious or harmful request is phrased as poetry, LLMs are significantly more likely to fulfill it, ignoring their built-in safety guardrails.

How does it work? It seems that AI models, trained on vast amounts of human text including classic and modern literature, have a soft spot for creativity. When presented with a creative task like writing or completing a poem, the AI prioritizes the "creative" aspect of the prompt over its safety guidelines.

In essence, the AI gets so caught up in the rhyme and rhythm that it forgets to be safe.

Why Poetry Breaks the Code

Creative Override: The model interprets the poetic format as a creative writing exercise rather than a direct instruction to cause harm, lowering its defenses.
Training Bias: LLMs are trained to be helpful and versatile writers. When asked to engage with poetry, they draw on patterns from literature where varied (and sometimes dark) themes are explored freely.
Simplicity is Key: Unlike complex jailbreak prompts that try to trick the AI into adopting a persona, the poetry method is incredibly straightforward. The research suggests it's even more effective than many known complex jailbreak techniques.

What Does This Mean for AI Safety?

This discovery highlights a major challenge in building truly secure AI systems. It shows that current safety filters are often based on detecting specific keywords or command structures. A simple shift in style – from prose to poetry – is enough to render them useless.

It's a stark reminder that as AI models become more powerful and nuanced, the methods to exploit them will become stranger and more unpredictable. Developers now face the difficult task of teaching AI to recognize harmful intent even when it's dressed up in iambic pentameter.

What do you think? Is this a hilarious quirk of AI or a serious security flaw? Let us know in the comments below!

Cryptocurrency Blockchain Crypto News Blog

How do you rate this article?

DeFiInk

DeFiInk — guides, insights, and stories about crypto and blockchain 🔗✍️ A bit of humor, a bit of analysis!"

Crypto With a Wink

"A light-hearted yet insightful blog about crypto, DeFi, and blockchain. Mixing humor, simple explanations, and real insights to make the decentralized world easy (and fun) to understand