Prompt caching is a feature that caches reused tokens in LLM API calls, dramatically reducing both latency and cost. It is especially impactful for agentic coding workflows where large context windows are repeatedly reused across many turns. Benchmarks show cost reductions of 53–90% and latency improvements of 31–79% depending on the use case. Most major providers (Anthropic, OpenAI, DeepSeek, GCP Vertex AI, Azure OpenAI) support it, but Amazon Bedrock does not yet have general availability. Cline is highlighted as an agentic coding tool that leverages prompt caching extensively and surfaces its effects to users.
Sort: