The Weight of Remembering
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
A deep technical and philosophical exploration of the KV cache in large language models — how it works, how it has evolved across architectures (GPT-2, Llama 3, DeepSeek V3, Gemma 3), and what its limitations mean for AI memory. Covers the physical cost of conversation state in GPU memory, cache eviction and prompt caching pricing, context rot in long conversations, the compaction problem, and external memory workarounds. Closes with a sci-fi reflection on Greg Egan's Diaspora and the trajectory toward AI systems that manage their own memory.
Table of contents
What KV Cache Actually IsHow Memory EvolvedWhat You Actually ExperienceWhere Memory BreaksThe Compaction ProblemExternal MemoryReshaping the MindSort: