DeepMind’s New AI Tracks Objects Faster Than Your Brain
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
Google DeepMind, UCL, and Oxford have released D4RT, a single-transformer model capable of 4D scene reconstruction (3D space + time) from video. Unlike previous approaches that chained multiple specialized models for depth, motion, and camera pose, D4RT handles all three simultaneously in one unified architecture. It outputs dynamic point clouds up to 300x faster than prior methods, can track objects through occlusion by leveraging temporal context, and achieves sub-pixel detail by feeding original high-resolution pixels back into the decoder. Trade-offs include lack of photorealistic rendering and the need for an extra meshing step for physics or 3D printing use cases.
Sort: