Netflix's VOID shows video editing has finally learned the laws of physics

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

Netflix and INSAIT researchers introduced VOID (Video Object and Interaction Deletion), a system that reframes video object removal as causal simulation rather than pixel inpainting. Instead of filling masked pixels, VOID uses a Vision-Language Model to identify 'causal ripples' an object leaves — shadows, reflections, physical interactions — and generates a 'quadmask' to guide a modified CogVideoX diffusion model. A two-pass pipeline addresses the 'jelly problem' where simulated moving objects deform, using flow-warped noise to maintain rigidity. Training relied on synthetic 3D data from Kubric and HUMOTO. Netflix has open-sourced the model, though it requires 40GB+ VRAM. Community reaction includes excitement about physics-aware editing and concerns about media manipulation and censorship.

5m read timeFrom notes.aimodels.fyi
Post cover image
Table of contents
From pixel-filling to causal reasoningThe two-pass pipeline and the “jelly” problemCommunity reaction: A100s and ethical anxietiesThe end of the “clean” plate?

Sort: