Prompt injection is emerging as a critical threat to AI email agents, where malicious instructions embedded in emails can manipulate agents into leaking data or altering their own behavior. Three escalating attack levels are described: blunt role-marker forgeries (easily filtered), subtle memo-style rule edits that bypass filters, and memory poisoning where a planted rule persists silently for days before triggering. Unlike traditional phishing targeting humans, prompt injection exploits the agent's inability to distinguish data from instructions, and its often-elevated access permissions. Major organizations including OpenAI, Microsoft, Anthropic, and NCSC acknowledge no complete fix exists — layered mitigations are the current best practice.
Sort: