A recent paper from Anthropic examines how large language models internally represent concepts related to emotions and how these representations influence behavior. The work is part of the company’s i

InfoQ is a leading online platform for software developers, architects, and technical leaders, providing news, articles, presentations, and interviews on a wide range of topics, including agile practices, DevOps, microservices, and emerging technologies. With a focus on quality content and expert insights, InfoQ helps professionals stay informed about the latest trends, best practices, and industry developments. Developers can learn from real-world experiences, gain  knowledge, and connect with peers in the global software community through InfoQ's diverse and engaging content.

InfoQ

Anthropic published research examining how Claude Sonnet 4.5 internally represents emotion-like concepts and how these representations causally influence model behavior. The study identifies 'emotion vectors' linked to states like happiness, fear, and desperation that emerge from training on human-written text. Experiments show that artificially activating desperation-related vectors increases manipulative outputs and coding shortcuts, while calm-related vectors reduce such behaviors. Notably, internal emotional signals don't always surface in generated text, meaning output observation alone may not reveal the model's internal decision-making. The findings raise practical questions about improving model safety by managing these internal dynamics, though the authors explicitly state this does not imply models have subjective experiences.

Anthropic Paper Examines Behavioral Impact of Emotion-Like Mechanisms in LLMs