Armin Ronacher argues that local model inference suffers not from model quality but from fragmentation and lack of polish. The local stack (llama.cpp, Ollama, LM Studio, MLX, etc.) forces users through a maze of configuration choices, and critical issues like tool parameter streaming remain unsolved. His proposed remedy: pick one model, one inference engine, one hardware target, and polish that combination relentlessly. He highlights ds4.c — Salvatore Sanfilippo's narrow, Mac-specific inference engine for DeepSeek V4 Flash — as the right approach, and introduces pi-ds4, an extension that embeds ds4.c directly into the Pi coding agent with zero configuration, auto-quantization selection, and lifecycle management.
Table of contents
Runnable Is Not FinishedFragmentationToo Little Critical MassThe DS4 BetEmbedding It In PiFocusing and LearningSort: