Research from Anthropic's Fellows Program examines whether AI failures stem from systematic misalignment (pursuing wrong goals coherently) or incoherence (unpredictable, inconsistent behavior). Using bias-variance decomposition across frontier models, the study finds that as tasks become harder and reasoning chains lengthen,

4m read timeFrom alignment.anthropic.com
Post cover image
Table of contents
IntroductionMeasuring Incoherence: A Bias-Variance DecompositionKey FindingsWhy Should We Expect Incoherence? LLMs as Dynamical SystemsImplications for AI SafetyConclusionAcknowledgements

Sort: