RAG vs. CAG, Explained Visually!
Cache-Augmented Generation (CAG) improves upon traditional RAG by caching static, rarely-changing information directly in the model's key-value memory, while continuing to retrieve dynamic data from vector databases. This hybrid approach reduces redundant fetches, lowers costs, and speeds up inference by separating stable "cold" data (cacheable) from frequently updated "hot" data (retrievable). The technique is already supported by APIs like OpenAI and Anthropic through prompt caching features.