Google's Gemma 4 model family has launched with four models spanning dense transformers and a new MoE architecture, supporting multimodal inputs (text, audio, vision, video) and over 35 languages out of the box. NVIDIA details deployment options across its full hardware stack: DGX Spark for local prototyping and agentic workflows, Jetson Orin Nano for edge robotics and embedded systems, and RTX GPUs for desktop development. All models fit on a single H100 GPU, with NVFP4 quantized checkpoints for Blackwell coming soon. Deployment is supported via vLLM, Ollama, llama.cpp, and Unsloth, while fine-tuning is available through NeMo Automodel using SFT and LoRA. Enterprise users can access a hosted NIM API for free prototyping or self-hosted production deployment under an NVIDIA Enterprise License. Models are available on Hugging Face under Apache 2.0.
Table of contents
Run intelligent workloads on-deviceBuild secure agentic AI workflows with DGX SparkPower physical AI agents with JetsonProduction ready deployment with NVIDIA NIMDay 0 fine-tuning with NeMo FrameworkGet started todaySort: