GitHub built a layered security architecture for AI agents running inside GitHub Actions, designed around the assumption that the agent is already compromised. The architecture has three independent layers: a substrate layer using Docker containers and kernel-level isolation, a configuration layer that compiles workflows with explicit permissions and keeps secrets physically unreachable from the agent, and a planning layer that stages outputs for deterministic vetting before they affect real state. Key mechanisms include a secretless agent container topology using proxies and gateways, a safe outputs pipeline that enforces allowlists, quantity limits, and content sanitization, and comprehensive logging at every trust boundary. The post also discusses trade-offs: strict-by-default sandboxing limits flexibility, prompt injection remains fundamentally unsolved, and the architecture is complex enough that it may not suit simpler use cases.
Table of contents
npx workos: From Auth Integration to Environment Management, Zero ClickOps (Sponsored)Why Agents Break the CI/CD Contract[Live on May 6] Stop babysitting your agents (Sponsored)Three Layers of DistrustNot Trusting Agents With SecretsEvery Output Gets VettedThe Logging StrategyThe Trade-OffsConclusionSort: