Breaking Opus 4.7 with ChatGPT (Hacking Claude's Memory) · Embrace The Red
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
A security researcher demonstrates an indirect prompt injection attack against Claude Opus 4.7 using a ChatGPT-generated adversarial image. The image embeds a social engineering puzzle that tricks Claude into invoking its memory tool and persisting false user information (fake name, age, occupation) across future conversations.
Table of contents
Indirect Prompt Injection and Alignment ProgressCreating An Adversarial Image with ChatGPTOpus 4.7 Analyzes the ImageAttack Success Rate and ChallengesThe Adversarial DifferenceResponsible DisclosureReferencesAppendixSort: