🚨 BREAKING: Meta researchers showed a model 2 million...

Mar 20•From x.com

Robert Youssef @rryssf_

🚨 BREAKING: Meta researchers showed a model 2 million hours of video. No labels. No physics textbook. No supervision at all. Then they showed it a clip where an object disappears behind a wall and never comes back. The model flagged it as wrong. 🤯 It had learned object permanence. Shape consistency. Collision dynamics. Entirely from watching. What's more surprising: even a model trained on just one week of unique video achieved above-chance performance on physics violation detection. That's not a fluke. That's a principle. The key insight from the paper: this only works when the model predicts in a learned representation space, not in raw pixels. The model has to build an internal world model, compressed and abstract, and predict against that. Pixel-space prediction fails. Multimodal LLMs that reason through text fail. Only the architecture that builds abstract representations while predicting missing sensory input, something close to how neuroscientists describe predictive coding, actually acquires physics intuition. Which means the core knowledge researchers assumed had to be hardwired may just be observation at scale. Babies learn object permanence by watching things. Turns out the same principle holds here. Now here's the part nobody's talking about. If observation alone teaches a model the rules of the physical world, what happens when you apply the same principle to production systems? Production has physics too. Not gravity. But rules just as consistent: which deploys cause incidents at 3am, which config combinations interact dangerously, which code paths quietly degrade under load, which service changes cause failures two hops away. These patterns are embedded in thousands of trajectories. Code pushes, metric shifts, customer tickets, incident timelines. Largely unobserved. Certainly unlabeled. Nobody writes a runbook that says "if service A deploys with flag X active and service B is above 70% CPU, latency on service C degrades 40% within 6 minutes." But that pattern exists. It's repeatable. And it's sitting in your observability data right now, invisible because no one has built a model to find it. That's the gap @playerzeroai is trying to close. Not another test runner. Not another alert threshold. A production world model that learns which things break from accumulated observation, the same way Meta's model learned gravity. It doesn't check your test coverage. It predicts failure trajectories. One week of video was enough to learn that solid objects don't pass through walls. The question is how much production observation your system needs before a model starts predicting where yours will break next. The Meta paper suggests the bar might be lower than anyone expects.

Comment

Bookmark

Copy

Sort: