GitHub built a layered security architecture for AI agents running inside GitHub Actions, designed around the assumption that the agent is already compromised. The architecture has three independent layers: a substrate layer using Docker containers and kernel-level isolation, a configuration layer that compiles workflows with explicit permissions and keeps secrets physically unreachable from the agent, and a planning layer that stages outputs for deterministic vetting before they affect real state. Key mechanisms include a secretless agent container topology using proxies and gateways, a safe outputs pipeline that enforces allowlists, quantity limits, and content sanitization, and comprehensive logging at every trust boundary. The post also discusses trade-offs: strict-by-default sandboxing limits flexibility, prompt injection remains fundamentally unsolved, and the architecture is complex enough that it may not suit simpler use cases.

13m read timeFrom blog.bytebytego.com
Post cover image
Table of contents
npx workos: From Auth Integration to Environment Management, Zero ClickOps (Sponsored)Why Agents Break the CI/CD Contract[Live on May 6] Stop babysitting your agents (Sponsored)Three Layers of DistrustNot Trusting Agents With SecretsEvery Output Gets VettedThe Logging StrategyThe Trade-OffsConclusion

Sort: