Google's Gemma 4 model family has launched with four models spanning dense transformers and a new MoE architecture, supporting multimodal inputs (text, audio, vision, video) and over 35 languages out of the box. NVIDIA details deployment options across its full hardware stack: DGX Spark for local prototyping and agentic workflows, Jetson Orin Nano for edge robotics and embedded systems, and RTX GPUs for desktop development. All models fit on a single H100 GPU, with NVFP4 quantized checkpoints for Blackwell coming soon. Deployment is supported via vLLM, Ollama, llama.cpp, and Unsloth, while fine-tuning is available through NeMo Automodel using SFT and LoRA. Enterprise users can access a hosted NIM API for free prototyping or self-hosted production deployment under an NVIDIA Enterprise License. Models are available on Hugging Face under Apache 2.0.

6m read timeFrom developer.nvidia.com
Post cover image
Table of contents
Run intelligent workloads on-deviceBuild secure agentic AI workflows with DGX SparkPower physical AI agents with JetsonProduction ready deployment with NVIDIA NIMDay 0 fine-tuning with NeMo FrameworkGet started today

Sort: