DeepSeek-V3.2 on GB300: Performance Breakthrough

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

DeepSeek-V3.2 and DeepSeek-R1 achieve significant performance gains on NVIDIA's GB300 (Blackwell Ultra) GPUs using FP4 quantization. DeepSeek-V3.2 reaches 7360 tokens/GPU/second in prefill-only scenarios with TP2 parallelization, while DeepSeek-R1 achieves 22476 TGS. Compared to Hopper H200, Blackwell shows 8x improvement in

11m read time From blog.vllm.ai
Post cover image
Table of contents
SummaryBenchmark SetupBasic Recipe with FP4 Weight QuantizationPerformance Boost by Blackwell ArchitectureDeployment TuningDeepSeek V3.2 - Still Way To GoDisaggregated Prefill (for DeepSeek-V3.2)Acknowledgements

Sort: