Most expert work isn’t “produce a probable artifact”; it's "choose a good move considering other agents, guessing hidden state". LLMs default to single-shot artifacts and need World Models to progress

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

LLMs excel at generating plausible artifacts but fail in adversarial, multi-agent scenarios because they lack world models. Unlike domain experts who simulate how counterparties will react, adapt, and exploit patterns, LLMs are trained on static text and optimized for single-shot outputs that sound reasonable in isolation. The article argues that scaling intelligence alone won't fix this—models need training loops that grade outcomes in multi-agent environments with hidden state, not just artifact quality. Perfect-information domains like coding remain tractable, but poker-like domains requiring theory of mind, strategic concealment, and adversarial robustness expose fundamental architectural gaps.

Experts Have World Models. LLMs Have Word Models.

Perfect Information Games: When You Don’t Need a Theory of Mind

Scaling Test Time Compute to Multi-Agent Civilizations: Noam Brown

The LLM Failure Mode: They’re Graded on Artifacts, Not on Reactions

Why “More Intelligence” Isn’t the Fix