I tested 3 local LLMs on my actual work — and each model won at something different
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
A hands-on comparison of three local LLMs — Gemma 4 E4B, GPT-OSS 20B, and Qwen 3.5 9B — tested on real personal workflows using LM Studio on an RTX 3070 with 8GB VRAM. GPT-OSS 20B excels at structured reasoning and content generation but is limited by context window constraints on lower VRAM hardware. Qwen 3.5 9B is the most versatile, handling long context, knowledge tasks, and even image analysis well, making it the go-to general-purpose pick. Gemma 4 E4B stands out for detailed visual/multimodal analysis but has an unusual UX where its reasoning and response are blended together. The key takeaway: no single local model wins at everything, and rotating between models based on task type — just like with cloud AI — yields the best results.
Table of contents
Before we get into itGPT-OSS 20B pulls ahead with structureQwen 3.5 9b is the knowledge and context generalistGemma 4 E4B is the multimodal specialistSort: