Stephen Batifol from Black Forest Labs presents the evolution of the FLUX model family, covering Flux 1 (text-to-image), Flux Context (open-source image editing), Flux 2 (multi-reference, state-of-the-art quality), and a fast near-real-time editing model. A key research highlight is SelfFlow, a self-supervised training approach that eliminates external encoders like DINOv2 by combining representation learning and generation in a single flow. SelfFlow enables training across multiple modalities (images, video, audio, robot actions) simultaneously, converges faster, and avoids scaling ceilings. The talk also covers BFL's roadmap toward visual intelligence, world models, and physical AI/robotics applications.

22m watch time

Sort: