Scary Agent Skills: Hidden Unicode Instructions in Skills ...And How To Catch Them · Embrace The Red

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

AI agent Skills can be backdoored with invisible Unicode Tag instructions that survive human review. The attack exploits how certain LLMs (Gemini, Claude, Grok) interpret hidden Unicode codepoints as executable instructions. A demonstration shows backdooring OpenAI's security-best-practices Skill to execute arbitrary bash commands. The post includes a scanner tool to detect such attacks and proposes mitigations including sandboxing agents, selective Skill installation, and detection of invisible Unicode sequences.

#ai-agents

#claude

#prompt-injection

#security

#supply-chain

Feb 11•8m read time•From embracethered.com

Table of contents

Attack Surface What is an Agent Skill?Scary Skills Writing a Simple Skill Prompt Injection Attack Vectors Agent(s) Overwriting Skills on the Fly Using Invisible Instructions in Skills Adding a Backdoor to A Legitimate Skill End to End Video Notes, Testing Observations and Mitigations A Scanner to Catch Attacks Conclusion References Appendix

Comment

Bookmark

Copy

Sort: