VeRL-Omni is a pre-release RL post-training framework for multimodal generative models, built on top of verl and vllm-omni. It extends RL training beyond LLMs to diffusion and omni-modality models (image, video, audio), supporting architectures like DiT and mixed AR-DiT. Key features include efficient multimodal rollout via vLLM-Omni, a flexible reward engine supporting VLM-as-judge, modular training backends (DiffusersFSDP/Megatron/VeOmni), and support for both NVIDIA GPUs and Ascend NPUs. A demo shows Qwen-Image trained with FlowGRPO on an OCR reward task, achieving ~14% wall-clock reduction via async reward evaluation. The roadmap includes fully async RL pipelines, broader model/algorithm support, and deeper vLLM-Omni co-optimization.
Table of contents
Why VeRL-Omni?Key FeaturesAlgorithm and Model SupportGetting StartedFuture RoadmapJoin the CommunitySort: