Anthropic published research examining how Claude Sonnet 4.5 internally represents emotion-like concepts and how these representations causally influence model behavior. The study identifies 'emotion vectors' linked to states like happiness, fear, and desperation that emerge from training on human-written text. Experiments show
Sort: