Inception Labs has launched Mercury 2, a reasoning language model built on diffusion-based architecture rather than traditional autoregressive decoding. By generating multiple tokens simultaneously through parallel refinement, Mercury 2 achieves over 1,000 tokens/second on NVIDIA Blackwell GPUs — more than 5x faster than
Table of contents
The fastest reasoning LLM, powered by diffusionMercury 2 at a glanceWhat Mercury 2 unlocks in productionGet startedSort: