I switched from a 20B model to a 9B one, and it was better

XDA Developers

Switching from a 20B parameter local LLM to a 9B Qwen model yielded better results not because of raw parameter count, but because of architecture. The Qwen 3.5 9B uses a Gated DeltaNet (GDN) hybrid architecture that maintains a fixed-size memory state instead of a growing KV cache, allowing much longer context windows without proportional VRAM growth. On an 8GB VRAM GPU running LM Studio, the author achieved stable 60k token sessions — far beyond what the larger model could handle — making it more practical for research, study sessions, and extended back-and-forth conversations.

I replaced my local LLM with a model half its size and got better results — And it wasn't about the parameters

Qwen 3.5 9B has a much larger context window