Google DeepMind's Gemma 4 is a family of open models ranging from 2B to 32B parameters, released under an Apache 2.0 license. Key highlights include on-device multimodal capabilities (images, video, audio) running on Android, iOS, and even Raspberry Pi; a novel per-layer embeddings architecture (E2B/E4B) that reduces GPU memory requirements; support for 140+ languages; and a Mixture of Experts 27B variant for low-latency inference. Within a week of launch, Gemma 4 reached 10 million downloads and over 1,000 community-built derivatives. The broader Gemma family has surpassed 500 million total downloads and 100,000 community models. Google collaborates with Hugging Face, llama.cpp, vLLM, Unsloth, and MLX to ensure ecosystem compatibility. Use cases highlighted include Android Studio offline coding assistance, medical research (Med-Gemini), multilingual fine-tuning for low-resource languages, and fully offline agentic workflows.

15m watch time

Sort: