Prompt caching is a feature that caches reused tokens in LLM API calls, dramatically reducing both latency and cost. It is especially impactful for agentic coding workflows where large context windows are repeatedly reused across many turns. Benchmarks show cost reductions of 53–90% and latency improvements of 31–79% depending on the use case. Most major providers (Anthropic, OpenAI, DeepSeek, GCP Vertex AI, Azure OpenAI) support it, but Amazon Bedrock does not yet have general availability. Cline is highlighted as an agentic coding tool that leverages prompt caching extensively and surfaces its effects to users.

2m read timeFrom smcleod.net
Post cover image
Table of contents
Why it matters #How it works #Who has it and who doesn’t #

Sort: