The boom in open source generative AI models is pushing beyond data centers into machines operating in the physical world. Developers are eager to deploy these…

NVIDIA DevTalk serves as a vibrant community hub where developers can engage in discussions, seek assistance, and collaborate on projects involving NVIDIA hardware and software. Developers can tap into the collective expertise of the NVIDIA developer community, sharing insights, troubleshooting issues, and exploring best practices for GPU programming and AI development. Additionally, DevTalk provides a platform for developers to showcase their projects, receive feedback, and network with peers, fostering collaboration and knowledge exchange within the NVIDIA ecosystem.

NVIDIA Developer

A practical guide to maximizing memory efficiency on NVIDIA Jetson edge devices for running large AI models. Covers five optimization layers: BSP/JetPack foundation (disabling GUI services, carveout regions, SWIOTLB tuning), inference pipeline (container vs bare metal, Python vs C++, pipeline config), inference frameworks (vLLM, SGLang, Llama.cpp), and model quantization (FP16 to W4A16/INT4/NVFP4). Combined savings of 10–12 GB are achievable. Includes a real-world example of a multimodal conversational robot running a full AI pipeline on Jetson Orin Nano 8 GB using 4-bit quantization and headless deployment.

Maximizing Memory Efficiency to Run Bigger Models on NVIDIA Jetson

Disaggregating inference at the edge with specialized accelerators

Real use-case: Reachy Mini Jetson Mini Assistant