SPEED-Bench is a new unified benchmark from NVIDIA for evaluating speculative decoding (SD) in LLM inference. Existing benchmarks are fragmented, use small prompt sets, and fail to reflect real-world serving conditions. SPEED-Bench addresses this with two dataset splits: a Qualitative split (880 prompts across 11 semantic
Table of contents
What is SPEED-Bench?The Qualitative split: semantic coverage and draft accuracyThe Throughput split: realistic serving workloadsA unified measurement frameworkInsights from SPEED-BenchStart using SPEED-BenchSort: