FLUX started as an image model story, but this talk makes the larger ambition clear: visual intelligence, not just image generation. From FLUX.1 through Kontext, FLUX.2, and FLUX.2 Klein, Black Forest Labs has been pushing fast, open releases while building toward models that understand images, video, audio, actions, and eventually the physical world itself.

Along the way, Stephen Batifol walks through the research behind that direction, including BFL's work on self-supervised multimodal training, real-time generation and editing, and the path from generative media toward world models and robotics.

Speaker info:
- https://x.com/stephenbtl
- https://www.linkedin.com/in/stephen-batifol/

AI Engineer

Stephen Batifol from Black Forest Labs presents the evolution of the FLUX model family, covering Flux 1 (text-to-image), Flux Context (open-source image editing), Flux 2 (multi-reference, state-of-the-art quality), and a fast near-real-time editing model. A key research highlight is SelfFlow, a self-supervised training approach that eliminates external encoders like DINOv2 by combining representation learning and generation in a single flow. SelfFlow enables training across multiple modalities (images, video, audio, robot actions) simultaneously, converges faster, and avoids scaling ceilings. The talk also covers BFL's roadmap toward visual intelligence, world models, and physical AI/robotics applications.

FLUX, Open Research, and the Future of Visual AI — Stephen Batifol, Black Forest Labs