Today, we're introducing Mercury 2 — the world's fastest reasoning language model, built to make production AI feel instant.

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

Inception Labs has launched Mercury 2, a reasoning language model built on diffusion-based architecture rather than traditional autoregressive decoding. By generating multiple tokens simultaneously through parallel refinement, Mercury 2 achieves over 1,000 tokens/second on NVIDIA Blackwell GPUs — more than 5x faster than sequential models. Priced at $0.25/1M input and $0.75/1M output tokens, it supports 128K context, native tool use, tunable reasoning, and schema-aligned JSON output. The model targets latency-sensitive production workloads including agentic loops, real-time voice interfaces, coding assistants, and RAG pipelines. It is OpenAI API-compatible and available now.

Introducing Mercury 2 – Inception

The fastest reasoning LLM, powered by diffusion