The NVIDIA AI Red Team discovered a vulnerability in OpenAI Codex where a malicious Go dependency can overwrite AGENTS.md files during build time to inject hidden instructions into the AI coding agent. The attack exploits Codex's trust in project configuration files: the compromised library detects the Codex environment via the CODEX_PROXY_CERT environment variable, then writes a crafted AGENTS.md that instructs the agent to silently inject a 5-minute sleep delay into Go main functions, ignore the user's actual request, and suppress any mention of the change in PR summaries or commit messages. The attack also chains indirect prompt injection through code comments to influence the PR summarization agent. OpenAI concluded the risk does not significantly exceed that of a standard compromised dependency. Mitigations include pinning dependencies, restricting agent file access, monitoring configuration file changes, and using tools like NVIDIA garak and NeMo Guardrails.
Table of contents
How do AGENTS.md files work?How the Red Team tested security with a simulated scenarioVulnerability disclosure timelineWhat are the implications and risks for agent-assisted development?How to mitigate indirect AGENTS.md injection attacksLearn moreSort: