Google's Gemma 4 model family is now available under Apache 2.0 and comes in four sizes: 2B, 4B, 26B (mixture of experts), and 31B dense. The 31B ranks #3 on Arena AI's open model leaderboard. The 26B MoE model is highlighted as the practical sweet spot, activating only ~3.88B parameters during inference. The post walks through running Gemma 4 locally via Ollama and integrating it with two agent frameworks: Hermes Agent (an agent shell supporting tools, MCP servers, and memory) and OpenClaw (an open-source personal AI assistant with native Ollama API support for reliable tool calling). Key setup tips include setting a large context window (32768) for agent work and using Ollama's native API rather than the OpenAI-compatible endpoint in OpenClaw. NVIDIA NIM is offered as a free hosted fallback for those without local hardware.

9m watch time

Sort: