XDA Developers

Running local LLMs on mobile is more viable than most people assume. Using PocketPal (free, open-source, iOS/Android) with Google's Gemma 4 E2B model (Unsloth GGUF Q4_K_M, ~3GB) on an iPhone 16 with 8GB RAM, the author found it handles most everyday AI tasks — including image input and TTS — without sending data to the cloud. Trade-offs include no web access, no speech-to-text input, and occasional lag at longer contexts, but the setup replaced ChatGPT, Claude, and Gemini for most mobile use cases.

I replaced ChatGPT, Claude, and Gemini on my phone with a local LLM, and it's a mobile upgrade I didn't expect

Gemma 4 E2B is built for this exact situation

Ditching cloud AI apps for my new local model