A hands-on walkthrough of all eight levels of Lakera's Gandalf prompt injection challenge, used as a controlled lab to expose structural weaknesses in LLM defense architectures. Each level reveals a distinct vulnerability: absent instructions, instruction gaps, deceptive responses, output filter bypasses via format manipulation, input encoding tricks (base64), LLM-as-judge miscalibration, indirect metadata extraction, and riddle-based semantic bypasses. The core thesis is that all defenses operate on form while attackers operate on meaning — a structural asymmetry that cannot be patched away. Real security requires architectural decisions about data access and blast radius, not just more filter rules.
Sort: