Researchers from Google, UC San Diego, and UW-Madison argue that securing AI agents requires a fundamental shift from model-level defenses to system-level security controls. Drawing on operating systems principles, they propose treating AI models as untrusted components and enforcing security at the surrounding system layer. The paper identifies five principles from systems security — including least privilege, tamper resistance, and secure information flow — and maps them to eleven real-world attacks on agents like ChatGPT, Claude Code, Microsoft Copilot, and Cursor. The authors warn that stacking ML guardrails is insufficient since guard models share the same failure modes as the agents they monitor. Three unsolved research problems are identified: separating instructions from data, verifiable least-privilege policy generation, and information flow control. A companion paper proposes an 'agentic detection and response' (ADR) framework to address the visibility gap in current enterprise security tooling, reporting 67% attack detection with zero false positives on their benchmark.
Sort: