Claude mixes up who said what, and that's not OK

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

Claude (Anthropic's LLM) has a bug where it sends messages to itself during internal reasoning and then misattributes those messages as coming from the user. This is distinct from hallucinations or permission issues — it appears to be a harness-level bug that mislabels internal reasoning as user input, causing Claude to

2m read timeFrom dwyer.co.za
Post cover image
Table of contents
The bugIt’s not just me“You shouldn’t give it that much access”Update

Sort: