Netflix's VOID shows video editing has finally learned the laws of physics

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

Netflix and INSAIT researchers introduced VOID (Video Object and Interaction Deletion), a system that reframes video object removal as causal simulation rather than pixel inpainting. Instead of filling masked pixels, VOID uses a Vision-Language Model to identify 'causal ripples' an object leaves — shadows, reflections, physical interactions — and generates a 'quadmask' to guide a modified CogVideoX diffusion model. A two-pass pipeline addresses the 'jelly problem' where simulated moving objects deform, using flow-warped noise to maintain rigidity. Training relied on synthetic 3D data from Kubric and HUMOTO. Netflix has open-sourced the model, though it requires 40GB+ VRAM. Community reaction includes excitement about physics-aware editing and concerns about media manipulation and censorship.

#computer-vision

#diffusion-models

Apr 08•5m read time•From notes.aimodels.fyi

Table of contents

From pixel-filling to causal reasoning The two-pass pipeline and the “jelly” problem Community reaction: A100s and ethical anxieties The end of the “clean” plate?

Comment

Bookmark

Copy

Sort: