Netflix's VOID shows video editing has finally learned the laws of physics
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
Netflix and INSAIT researchers introduced VOID (Video Object and Interaction Deletion), a system that reframes video object removal as causal simulation rather than pixel inpainting. Instead of filling masked pixels, VOID uses a Vision-Language Model to identify 'causal ripples' an object leaves — shadows, reflections, physical interactions — and generates a 'quadmask' to guide a modified CogVideoX diffusion model. A two-pass pipeline addresses the 'jelly problem' where simulated moving objects deform, using flow-warped noise to maintain rigidity. Training relied on synthetic 3D data from Kubric and HUMOTO. Netflix has open-sourced the model, though it requires 40GB+ VRAM. Community reaction includes excitement about physics-aware editing and concerns about media manipulation and censorship.
Table of contents
From pixel-filling to causal reasoningThe two-pass pipeline and the “jelly” problemCommunity reaction: A100s and ethical anxietiesThe end of the “clean” plate?Sort: