New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
NVIDIA has launched Nemotron 3 Super, a 120-billion-parameter hybrid mixture-of-experts (MoE) model with only 12 billion active parameters at inference, designed for agentic AI workloads. The model addresses two key challenges in multi-agent systems: context explosion (up to 15x more tokens than standard chat) and the 'thinking tax' of reasoning at every step. Key architectural innovations include Mamba layers for 4x memory/compute efficiency, Latent MoE for improved accuracy, and multi-token prediction for 3x faster inference, delivering up to 5x higher throughput and 2x higher accuracy over the previous Nemotron Super. It features a 1-million-token context window and runs in NVFP4 precision on Blackwell GPUs. The model is open-weight with training data and recipes published, and is available via Hugging Face, build.nvidia.com, and numerous cloud and inference providers including Google Cloud Vertex AI, Oracle, AWS Bedrock, and Azure.
Table of contents
Hybrid ArchitectureOpen Weights, Data and RecipesUse in Agentic SystemsAvailabilitySort: