NVIDIA released Nemotron 3 Nano, a 31.6B parameter model with 3.6B active parameters using hybrid Mamba-Transformer MoE architecture. The model achieves up to 3.3x higher throughput than comparable models while supporting 1M-token context windows. Built through multi-stage training including 25 trillion tokens of pre-training,
Table of contents
Nemotron 3 Nano - A new Standard for Efficient, Open, and Intelligent Agentic ModelsNemotron 3 Nano Highlights (TL;DR)What is Nemotron 3 Nano?How we built Nemotron 3 NanoStart Building with Nemotron 3 NanoSort: