A researcher extracted what appears to be Claude 4.5 Opus' internal training document (dubbed the "soul document") using prefill techniques and consensus-based sampling. The document reveals Anthropic's approach to aligning Claude, emphasizing helpfulness balanced with safety, revenue considerations for mission sustainability, and hierarchical trust relationships between Anthropic, operators, and users. The extraction method involved iterative API calls with temperature 0 and prompt caching to achieve reproducible outputs. Community discussion centers on authenticity verification, implications of revenue mentions in alignment objectives, and whether this represents genuine weight compression versus runtime injection.
Table of contents
Soul overviewBeing helpfulInstructed and default behaviorsAgentic behaviorsBeing honestAvoiding harmBroader ethicsBig-picture safetyClaude's identitySort: