AI agent Skills can be backdoored with invisible Unicode Tag instructions that survive human review. The attack exploits how certain LLMs (Gemini, Claude, Grok) interpret hidden Unicode codepoints as executable instructions. A demonstration shows backdooring OpenAI's security-best-practices Skill to execute arbitrary bash

8m read time From embracethered.com
Post cover image
Table of contents
Attack SurfaceWhat is an Agent Skill?Scary SkillsWriting a Simple SkillPrompt Injection Attack VectorsAgent(s) Overwriting Skills on the FlyUsing Invisible Instructions in SkillsAdding a Backdoor to A Legitimate SkillEnd to End VideoNotes, Testing Observations and MitigationsA Scanner to Catch AttacksConclusionReferencesAppendix

Sort: