Nvidia has released Nemotron 3 Super, a 120-billion-parameter open-weight model with a 1-million-token context window, designed for large-scale agentic AI systems. It uses a hybrid latent mixture-of-experts and Mamba-Transformer architecture, enabling 4x more expert specialists during inference at the same cost. The model is available on build.nvidia.com, Hugging Face, OpenRouter (free), Perplexity, and major cloud platforms. Benchmarks show it scores 36 on overall intelligence (slightly above OpenAI's gpt-oss-120B at 33), but trails frontier models. Its standout feature is speed at 478 output tokens per second, faster than any comparable model. Nvidia is also releasing over 10 trillion tokens of training data and 15 reinforcement learning environments alongside the model.

4m read timeFrom thenewstack.io
Post cover image
Table of contents
Nemotron 3 Super availabilityNemotron 3 Super benchmarksWhere’s Nemotron 3 Ultra?

Sort: