Running Google Gemma 4 Locally With LM Studio’s New Headless CLI & Claude Code

A hands-on guide to running Google Gemma 4 26B-A4B locally using LM Studio 0.4.0's new headless CLI (lms) on macOS. Covers the new llmster daemon, downloading and loading the model, memory estimation at various context lengths, hardware tuning options (GPU offloading, context length, TTL, flash attention), and serving via OpenAI/Anthropic-compatible APIs. Also details how to wire Claude Code to the local LM Studio server via a shell alias, routing all model calls through Gemma 4 for fully offline, zero-cost coding assistance. Performance on an M4 Pro MacBook Pro with 48 GB unified memory: 51 tokens/sec at 48K context, 17.99 GB model footprint.

#claude-code

#gemma

#local-ai

#mixture-of-experts

Apr 05•20m read time•From ai.georgeliu.com

Table of contents

Why run models locally?The Gemma 4 model family What changed in LM Studio 0.4.0 Installation Downloading Gemma 4 Checking your local model library Running an interactive chat Checking loaded models and memory Memory estimates by context length Tuning model loading for your hardware The LM Studio desktop app Serving models via API Using Gemma 4 as a Claude Code backend What I learned What did not work What is next

Comment

Bookmark

Copy

Sort: