A comprehensive research paper by 11 authors from IBM, Google, Microsoft and other organizations presents six design patterns to mitigate prompt injection attacks in LLM agents. The patterns include Action-Selector, Plan-Then-Execute, LLM Map-Reduce, Dual LLM, Code-Then-Execute, and Context-Minimization approaches. Each pattern trades some agent flexibility for security by constraining actions and preventing untrusted input from triggering arbitrary tasks. The paper includes ten detailed case studies covering practical applications like SQL agents, email assistants, and customer service chatbots, providing threat models and mitigation strategies for each scenario.

9m read timeFrom simonwillison.net
Post cover image

Sort: