Google's newest Gemma 4 models are both powerful and useful.

XDA Developers

Google's Gemma 4 family consists of four models — a 31B dense, a 26B-A4B Mixture of Experts, and two edge models (E4B and E2B) — all released under Apache 2.0 and natively multimodal. The 26B-A4B MoE stands out by activating only 3.8B parameters per token, delivering near-31B quality at much faster inference speeds. The edge models add native audio input and function calling, enabling fully offline voice agents on mobile. A notable caveat: Google withheld the Multi-Token Prediction heads from the public weights, limiting inference speed on the 31B — though community-trained EAGLE3 draft heads and traditional speculative decoding using smaller Gemma 4 models as drafts offer workarounds. For home lab local inference setups, the Gemma 4 family is presented as the most well-rounded option currently available.

Google's Gemma 4 isn't the smartest local LLM I've run, but it's the one I reach for most

Tool calling that's baked into the architecture

Speculative decoding gives the 31B a speed boost