DeepSeek-V3.2 on GB300: Performance Breakthrough
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
DeepSeek-V3.2 and DeepSeek-R1 achieve significant performance gains on NVIDIA's GB300 (Blackwell Ultra) GPUs using FP4 quantization. DeepSeek-V3.2 reaches 7360 tokens/GPU/second in prefill-only scenarios with TP2 parallelization, while DeepSeek-R1 achieves 22476 TGS. Compared to Hopper H200, Blackwell shows 8x improvement in
Table of contents
SummaryBenchmark SetupBasic Recipe with FP4 Weight QuantizationPerformance Boost by Blackwell ArchitectureDeployment TuningDeepSeek V3.2 - Still Way To GoDisaggregated Prefill (for DeepSeek-V3.2)AcknowledgementsSort: