How Prompts Break Systems: A Practical Analysis of LLM Defense Architecture If you want to understand how LLM defenses fail, stop reading papers for a moment and go break something. Gandalf is …

InfoSecWriteUps' platform is  dedicated to providing insights and resources for cybersecurity professionals and enthusiasts. Through articles, tutorials, and security research, InfoSecWriteUps offers insights into cybersecurity vulnerabilities, attack techniques, and defense strategies. Readers can learn about penetration testing, threat intelligence, and incident response to enhance their cybersecurity knowledge and skills.

InfoSec Write-ups

A hands-on walkthrough of all eight levels of Lakera's Gandalf prompt injection challenge, used as a controlled lab to expose structural weaknesses in LLM defense architectures. Each level reveals a distinct vulnerability: absent instructions, instruction gaps, deceptive responses, output filter bypasses via format manipulation, input encoding tricks (base64), LLM-as-judge miscalibration, indirect metadata extraction, and riddle-based semantic bypasses. The core thesis is that all defenses operate on form while attackers operate on meaning — a structural asymmetry that cannot be patched away. Real security requires architectural decisions about data access and blast radius, not just more filter rules.