NVIDIA's TensorRT Edge-LLM is a high-performance C++ inference runtime for deploying LLMs and VLMs on embedded platforms like DRIVE AGX Thor and Jetson Thor. The latest release introduces Mixture of Experts (MoE) support for efficient reasoning at scale, hybrid Mamba-2-Transformer architecture via Nemotron 2 Nano for reduced memory footprint, and end-to-end speech processing with Qwen3-TTS/ASR. For robotics, it now supports Cosmos Reason 2, a VLM with physical common sense, spatio-temporal reasoning, and 256K token context. For autonomous driving, the forthcoming Alpamayo 1 model brings end-to-end trajectory planning with flow matching and chain-of-causation reasoning. The runtime eliminates Python dependencies for predictable, production-viable deployments.

7m read timeFrom developer.nvidia.com
Post cover image
Table of contents
Efficient reasoning at scaleUnlock hybrid reasoning at the edgeReal-time multimodal interaction at the edgeEquipping humanoid robotics with physical common senseAdvancing autonomous driving with end-to-end trajectory planningGet started with TensorRT Edge-LLM for physical AI

Sort: