As AI agents gain tools, memory, and planning capabilities, their attack surface expands far beyond simple prompt injection. A structured four-surface taxonomy covers the Prompt Surface (indirect injection via external data), Tool Surface (parameter injection and privilege escalation), Memory Surface (persistent memory poisoning), and Planning Loop Surface (goal hijacking and multi-agent cascade corruption). Real-world incidents and research data back each threat: 88% of organizations reported AI agent security incidents, memory injection attacks achieved 95% success rates across leading models, and a single compromised orchestrator poisoned 87% of downstream agents in a simulation. Defenses include least-privilege permissions, instruction isolation, provenance tracking for memory writes, reasoning-step logging, and agent isolation. Security and autonomy exist on a dial, and controls must be proportional to the agent's capability profile, task environment, and blast radius.
Table of contents
From LLM to Agent : Why the Threat Model ChangesThe Four-Surface Attack TaxonomySecurity vs. Agent Autonomy: The Tradeoff SpaceImplementation: Moving from Taxonomy to ArchitectureConclusionSort: