NVIDIA has released Nemotron 3 Super, a 120B total / 12B active-parameter open model designed for multi-agent AI applications. Key architectural innovations include a hybrid Mamba-Transformer MoE backbone with linear-time sequence processing, a Latent MoE that enables 4x more expert consultations at the same compute cost, multi-token prediction (MTP) for built-in speculative decoding with up to 3x speedups, and native NVFP4 pretraining optimized for Blackwell GPUs. The model supports a 1M-token context window to combat context explosion in long agentic tasks. Training involved 25 trillion tokens of pretraining, 7 million SFT samples, and multi-environment RL across 21 configurations using NeMo Gym. Weights, datasets, training recipes, and deployment cookbooks (vLLM, SGLang, TensorRT-LLM) are fully open. The model is available on Hugging Face, NVIDIA NIM, and multiple inference providers.
Table of contents
What makes Nemotron 3 Super differentSee it in actionDiving deep into the architectureHow we trained Nemotron 3 SuperBenchmarking Nemotron 3 SuperBuilding with Super’s open resourcesGet startedSort: