Prompt injection attacks on AI agents are significantly more dangerous than on traditional LLMs because agents can autonomously access sensitive data, process untrusted inputs, and take real-world actions. Drawing on Meta's 'Agents Rule of Two' and Simon Willison's 'Lethal Trifecta' frameworks, this post maps the three risk pillars to concrete mitigations on the Databricks platform. For data access (Pillar 1), controls include on-behalf-of-user authentication, Unity Catalog fine-grained ACLs, ABAC row/column filters, and AI Gateway PII detection. For untrusted inputs (Pillar 2), Mosaic AI Gateway guardrails using Llama PromptGuard 2 and Llama Guard reduced attack success rates by over 90% in testing, supplemented by versioned prompt management via MLflow Prompt Registry. For external actions (Pillar 3), serverless egress controls enforce deny-by-default outbound policies, Unity Catalog workspace bindings prevent cross-environment state changes, and inference/system tables enable real-time monitoring and alerting. A practical example agent called Social Gauge illustrates each control layer throughout.

20m read timeFrom databricks.com
Post cover image
Table of contents
Controls for Prompt Injection Risks on AI Agents by PillarMonitoring your AI Agents for Security Risks

Sort: