Anthropic's interpretability team built tools to trace Claude's actual internal computations, revealing a significant gap between what Claude says it does and what actually happens. Key findings include: Claude operates in a language-agnostic conceptual space; it plans ahead when writing poetry rather than generating
Table of contents
How AgentField Ships Production Code with 200 Autonomous Agents (Sponsored)Looking Inside an LLMClaude Thinks In ConceptHow Claude Plans PoetryHow Claude Does MathsWhen Claude’s Reasoning is MotivatedWhy Hallucinations HappenWhen Grammar Overrides SafetyConclusionSort: