vLLM

vLLM-Omni now supports cache acceleration methods like Cache-DiT and TeaCache for diffusion model inference, achieving 1.5x to 2.38x speedups with minimal quality loss. These techniques intelligently cache intermediate computations to avoid redundant work across diffusion timesteps. Cache-DiT offers advanced control with DBCache, TaylorSeer, and Step Computation Masking, while TeaCache provides a simpler adaptive caching mechanism. Benchmarks on NVIDIA H200 GPUs show Qwen-Image generation accelerated from 20s to ~10.5s, and Qwen-Image-Edit from 51.5s to 21.6s. Both methods are easily integrated via simple configuration parameters.

vLLM-Omni Diffusion Cache Acceleration