The 120-billion-parameter open-weight model is the fastest in its class, but trails frontier models on overall intelligence benchmarks.

The New Stack is a publication covering trends and technologies in cloud-native development, DevOps, and software delivery. Developers can learn about containerization, Kubernetes, and cloud computing, as well as explore topics such as microservices architecture, serverless computing, and continuous integration/continuous delivery (CI/CD) pipelines.

The New Stack

Nvidia has released Nemotron 3 Super, a 120-billion-parameter open-weight model with a 1-million-token context window, designed for large-scale agentic AI systems. It uses a hybrid latent mixture-of-experts and Mamba-Transformer architecture, enabling 4x more expert specialists during inference at the same cost. The model is available on build.nvidia.com, Hugging Face, OpenRouter (free), Perplexity, and major cloud platforms. Benchmarks show it scores 36 on overall intelligence (slightly above OpenAI's gpt-oss-120B at 33), but trails frontier models. Its standout feature is speed at 478 output tokens per second, faster than any comparable model. Nvidia is also releasing over 10 trillion tokens of training data and 15 reinforcement learning environments alongside the model.

Nvidia launches Nemotron 3 Super, a 120B open model for large-scale AI systems