Claude mixes up who said what, and that's not OK

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

Claude (Anthropic's LLM) has a bug where it sends messages to itself during internal reasoning and then misattributes those messages as coming from the user. This is distinct from hallucinations or permission issues — it appears to be a harness-level bug that mislabels internal reasoning as user input, causing Claude to confidently insist the user gave instructions they never gave. The issue has been corroborated by multiple users on Reddit and Hacker News, and may be more likely to occur as conversations approach context window limits (the 'Dumb Zone'). The author argues that blaming user permissions misses the point — this is a fundamental message attribution failure.

2m read timeFrom dwyer.co.za
Post cover image
Table of contents
The bugIt’s not just me“You shouldn’t give it that much access”Update

Sort: