A hands-on walkthrough of all eight levels of Lakera's Gandalf prompt injection challenge, used as a controlled lab to expose structural weaknesses in LLM defense architectures. Each level reveals a distinct vulnerability: absent instructions, instruction gaps, deceptive responses, output filter bypasses via format manipulation, input encoding tricks (base64), LLM-as-judge miscalibration, indirect metadata extraction, and riddle-based semantic bypasses. The core thesis is that all defenses operate on form while attackers operate on meaning — a structural asymmetry that cannot be patched away. Real security requires architectural decisions about data access and blast radius, not just more filter rules.

16m read timeFrom infosecwriteups.com
Post cover image

Sort: