Mitigating The Risk of Prompt Injection for AI Agents on Databricks

Prompt injection attacks on AI agents are significantly more dangerous than on traditional LLMs because agents can autonomously access sensitive data, process untrusted inputs, and take real-world actions. Drawing on Meta's 'Agents Rule of Two' and Simon Willison's 'Lethal Trifecta' frameworks, this post maps the three risk pillars to concrete mitigations on the Databricks platform. For data access (Pillar 1), controls include on-behalf-of-user authentication, Unity Catalog fine-grained ACLs, ABAC row/column filters, and AI Gateway PII detection. For untrusted inputs (Pillar 2), Mosaic AI Gateway guardrails using Llama PromptGuard 2 and Llama Guard reduced attack success rates by over 90% in testing, supplemented by versioned prompt management via MLflow Prompt Registry. For external actions (Pillar 3), serverless egress controls enforce deny-by-default outbound policies, Unity Catalog workspace bindings prevent cross-environment state changes, and inference/system tables enable real-time monitoring and alerting. A practical example agent called Social Gauge illustrates each control layer throughout.

#ai-agents

#ai-security

#databricks

#prompt-injection

Mar 11•20m read time•From databricks.com

Table of contents

Controls for Prompt Injection Risks on AI Agents by Pillar Monitoring your AI Agents for Security Risks

Comment

Bookmark

Copy

Sort: