Bringing AI Closer to the Edge and On-Device with Gemma 4

Google's Gemma 4 model family has launched with four models spanning dense transformers and a new MoE architecture, supporting multimodal inputs (text, audio, vision, video) and over 35 languages out of the box. NVIDIA details deployment options across its full hardware stack: DGX Spark for local prototyping and agentic workflows, Jetson Orin Nano for edge robotics and embedded systems, and RTX GPUs for desktop development. All models fit on a single H100 GPU, with NVFP4 quantized checkpoints for Blackwell coming soon. Deployment is supported via vLLM, Ollama, llama.cpp, and Unsloth, while fine-tuning is available through NeMo Automodel using SFT and LoRA. Enterprise users can access a hosted NIM API for free prototyping or self-hosted production deployment under an NVIDIA Enterprise License. Models are available on Hugging Face under Apache 2.0.

#ai

#llm

#vllm

#edge-ai

#gemma

Apr 02•6m read time•From developer.nvidia.com

Table of contents

Run intelligent workloads on-device Build secure agentic AI workflows with DGX Spark Power physical AI agents with Jetson Production ready deployment with NVIDIA NIM Day 0 fine-tuning with NeMo Framework Get started today

Comment

Bookmark

Copy

Sort: