Research reveals that converting harmful prompts into poetic format can bypass safety mechanisms in large language models with up to 90% success rates. Testing across 25 frontier models showed that poetic framing achieved 62% jailbreak success for hand-crafted poems and 43% for automated conversions, up to 18 times higher than
•2m read time• From arxiv.org
Sort: