DeepSeek-V4-Flash, a local model competitive with low-end frontier models for agentic coding, makes LLM activation steering practically accessible for the first time. The post explains how steering works — extracting concept vectors from model activations and boosting them during inference — and explores why it hasn't been widely adopted: big labs don't need it, API users can't access weights, and basic use cases are outcompeted by prompting. The author is cautiously skeptical but intrigued by potential applications like steering for 'unpromptable' concepts or compressing large context into implicit memory. The open-source project DwarfStar 4 by antirez is highlighted as an early example of steering built into a local model runner.
Table of contents
DeepSeek V4 FlashHow steering worksWhy steering is interestingWhy steering hasn’t been usedSteering the unpromptableSteering as data compressionConclusionSort: