New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

NVIDIA has launched Nemotron 3 Super, a 120-billion-parameter hybrid mixture-of-experts (MoE) model with only 12 billion active parameters at inference, designed for agentic AI workloads. The model addresses two key challenges in multi-agent systems: context explosion (up to 15x more tokens than standard chat) and the 'thinking tax' of reasoning at every step. Key architectural innovations include Mamba layers for 4x memory/compute efficiency, Latent MoE for improved accuracy, and multi-token prediction for 3x faster inference, delivering up to 5x higher throughput and 2x higher accuracy over the previous Nemotron Super. It features a 1-million-token context window and runs in NVFP4 precision on Blackwell GPUs. The model is open-weight with training data and recipes published, and is available via Hugging Face, build.nvidia.com, and numerous cloud and inference providers including Google Cloud Vertex AI, Oracle, AWS Bedrock, and Azure.

#agentic-ai

#llm

#mixture-of-experts

#nvidia

Mar 11•5m read time•From blogs.nvidia.com

Table of contents

Hybrid Architecture Open Weights, Data and Recipes Use in Agentic Systems Availability

Comment

Bookmark

Copy

Sort: