In this video, I'll be talking about Google's new Gemma 4 open models, why they are such a big deal for local AI, and how you can run them with Ollama, Hermes Agent, and OpenClaw, or even try Gemma 4 31B through NVIDIA NIM.

--
Key Takeaways:

🚀 Google’s Gemma 4 is one of the most interesting open model releases so far, with strong performance for its size.  
🧠 The lineup includes E2B, E4B, 26B MoE, and 31B dense models, giving users options for both lightweight and powerful local setups.  
🏆 Gemma 4 is ranking highly on open model leaderboards and is even outperforming models much larger than itself.  
🔓 It is now under Apache 2.0, which makes it a much more practical choice for people who care about open model licensing.  
🛠️ Ollama already supports Gemma 4, making it easy to run locally with simple commands.  
🤖 Hermes Agent and OpenClaw both make Gemma 4 far more useful by turning it into part of a real local agent workflow.  
☁️ If you cannot run it locally, NVIDIA NIM gives you a free hosted way to test Gemma 4 31B for prototyping.

AICodeKing

Google's Gemma 4 model family is now available under Apache 2.0 and comes in four sizes: 2B, 4B, 26B (mixture of experts), and 31B dense. The 31B ranks #3 on Arena AI's open model leaderboard. The 26B MoE model is highlighted as the practical sweet spot, activating only ~3.88B parameters during inference. The post walks through running Gemma 4 locally via Ollama and integrating it with two agent frameworks: Hermes Agent (an agent shell supporting tools, MCP servers, and memory) and OpenClaw (an open-source personal AI assistant with native Ollama API support for reliable tool calling). Key setup tips include setting a large context window (32768) for agent work and using Ollama's native API rather than the OpenAI-compatible endpoint in OpenClaw. NVIDIA NIM is offered as a free hosted fallback for those without local hardware.

Gemma 4 + Hermes/OpenClaw: HOW IS THIS POSSIBLE? FULLY LOCAL AI Agent that ACTUALLY WORKS!