XDA Developers

Running local LLMs without a GPU is more feasible than many assume. Google's Gemma 4 model family offers multiple size tiers — from the 1.5GB E2B that runs on a Raspberry Pi to the 26B A4B with sparse activation — making CPU-only inference practical. The E4B variant stands out as a sweet spot for daily tasks like email drafting, logic puzzles, and RAG with native vision support. Microsoft's Phi-4 Reasoning Plus is also highlighted as a strong CPU-capable option for complex reasoning. The takeaway: optimized, lean models can replace expensive GPU upgrades for most local AI workflows.

I thought I needed a GPU for local LLMs until I tried this lean model

My real-life experience with Gemma 4 models