A practical guide to maximizing memory efficiency on NVIDIA Jetson edge devices for running large AI models. Covers five optimization layers: BSP/JetPack foundation (disabling GUI services, carveout regions, SWIOTLB tuning), inference pipeline (container vs bare metal, Python vs C++, pipeline config), inference frameworks (vLLM, SGLang, Llama.cpp), and model quantization (FP16 to W4A16/INT4/NVFP4). Combined savings of 10–12 GB are achievable. Includes a real-world example of a multimodal conversational robot running a full AI pipeline on Jetson Orin Nano 8 GB using 4-bit quantization and headless deployment.
Table of contents
Edge AI software stackDisaggregating inference at the edge with specialized acceleratorsReal use-case: Reachy Mini Jetson Mini AssistantGet startedSort: