Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning

NVIDIA has released Nemotron 3 Super, a 120B total / 12B active-parameter open model designed for multi-agent AI applications. Key architectural innovations include a hybrid Mamba-Transformer MoE backbone with linear-time sequence processing, a Latent MoE that enables 4x more expert consultations at the same compute cost, multi-token prediction (MTP) for built-in speculative decoding with up to 3x speedups, and native NVFP4 pretraining optimized for Blackwell GPUs. The model supports a 1M-token context window to combat context explosion in long agentic tasks. Training involved 25 trillion tokens of pretraining, 7 million SFT samples, and multi-environment RL across 21 configurations using NeMo Gym. Weights, datasets, training recipes, and deployment cookbooks (vLLM, SGLang, TensorRT-LLM) are fully open. The model is available on Hugging Face, NVIDIA NIM, and multiple inference providers.

#agentic-ai

#llm

#mixture-of-experts

#reinforcement-learning

Mar 11•12m read time•From developer.nvidia.com

Table of contents

What makes Nemotron 3 Super different See it in action Diving deep into the architecture How we trained Nemotron 3 Super Benchmarking Nemotron 3 Super Building with Super’s open resources Get started

Comment

Bookmark

Copy

Sort: