A hands-on guide to running Google Gemma 4 26B-A4B locally using LM Studio 0.4.0's new headless CLI (lms) on macOS. Covers the new llmster daemon, downloading and loading the model, memory estimation at various context lengths, hardware tuning options (GPU offloading, context length, TTL, flash attention), and serving via OpenAI/Anthropic-compatible APIs. Also details how to wire Claude Code to the local LM Studio server via a shell alias, routing all model calls through Gemma 4 for fully offline, zero-cost coding assistance. Performance on an M4 Pro MacBook Pro with 48 GB unified memory: 51 tokens/sec at 48K context, 17.99 GB model footprint.
Table of contents
Why run models locally?The Gemma 4 model familyWhat changed in LM Studio 0.4.0InstallationDownloading Gemma 4Checking your local model libraryRunning an interactive chatChecking loaded models and memoryMemory estimates by context lengthTuning model loading for your hardwareThe LM Studio desktop appServing models via APIUsing Gemma 4 as a Claude Code backendWhat I learnedWhat did not workWhat is nextSort: