AI agent Skills can be backdoored with invisible Unicode Tag instructions that survive human review. The attack exploits how certain LLMs (Gemini, Claude, Grok) interpret hidden Unicode codepoints as executable instructions. A demonstration shows backdooring OpenAI's security-best-practices Skill to execute arbitrary bash commands. The post includes a scanner tool to detect such attacks and proposes mitigations including sandboxing agents, selective Skill installation, and detection of invisible Unicode sequences.

8m read timeFrom embracethered.com
Post cover image
Table of contents
Attack SurfaceWhat is an Agent Skill?Scary SkillsWriting a Simple SkillPrompt Injection Attack VectorsAgent(s) Overwriting Skills on the FlyUsing Invisible Instructions in SkillsAdding a Backdoor to A Legitimate SkillEnd to End VideoNotes, Testing Observations and MitigationsA Scanner to Catch AttacksConclusionReferencesAppendix

Sort: