Security researcher Aonan Guan successfully hijacked AI agents from Anthropic (Claude Code), Google (Gemini CLI Action), and Microsoft (Copilot Agent) using indirect prompt injection attacks via GitHub Actions integrations. By embedding malicious instructions in PR titles, issue bodies, and HTML comments, Guan tricked each agent into leaking API keys and GitHub tokens. All three vendors quietly paid bug bounties — $100 from Anthropic, $500 from GitHub, an undisclosed amount from Google — but none published public advisories or assigned CVEs, leaving users on older versions unaware of the risk. The broader issue is structural: LLMs cannot reliably separate data from instructions, making every data source an attack vector. Without established disclosure frameworks for AI agent vulnerabilities, security teams lack the artifacts needed to track and remediate these flaws, and the problem is compounding as agent supply chains grow.

6m read timeFrom thenextweb.com
Post cover image
Table of contents
How the attacks workThe quiet fixA structural problem, not a one-off bugThe disclosure gap

Sort: