DeepSeek-V3.2 on GB300: Performance Breakthrough
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
DeepSeek-V3.2 and DeepSeek-R1 achieve significant performance gains on NVIDIA's GB300 (Blackwell Ultra) GPUs using FP4 quantization. DeepSeek-V3.2 reaches 7360 tokens/GPU/second in prefill-only scenarios with TP2 parallelization, while DeepSeek-R1 achieves 22476 TGS. Compared to Hopper H200, Blackwell shows 8x improvement in
•11m read time• From blog.vllm.ai
Table of contents
SummaryBenchmark SetupBasic Recipe with FP4 Weight QuantizationPerformance Boost by Blackwell ArchitectureDeployment TuningDeepSeek V3.2 - Still Way To GoDisaggregated Prefill (for DeepSeek-V3.2)AcknowledgementsSort: