"Disregard that!" attacks

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

Prompt injection attacks — reframed here as 'Disregard that!' attacks — are a fundamental and largely unsolved security problem for LLM-based applications. Any time untrusted content enters an LLM's context window (user messages, web search results, API responses, shared file systems), an attacker can override the system's

10m read timeFrom calpaterson.com
Post cover image
Table of contents
The context windowSharing a context window"Disregard that!" - context window takeoverSurprise sharingMulti-level mungingStructured inputNot being lucky alwaysWhat actually worksWhere this leaves poor JeffContact/etcOther notes

Sort: