Heroku is introducing automatic prompt caching for its Managed Inference and Agents service, launching December 18, 2025. The feature speeds up AI inference by caching and reusing processed system prompts and tool definitions, with caches expiring after five minutes of inactivity. Currently enabled by default at no additional cost, it supports multiple models including Claude Sonnet 4.5, Claude Haiku variants, and Amazon Nova models, each with specific token thresholds. Developers can opt out of caching for sensitive workflows using the X-Heroku-Prompt-Caching header. The implementation focuses on system prompts and tool definitions while excluding user messages and conversation history for security.

3m read timeFrom heroku.com
Post cover image
Table of contents
What is prompt caching and how does it speed up AI inference?How prompt caching works on HerokuSupported models and caching detailsEnterprise-grade security and privacyHow to disable prompt caching on sensitive workflows (opt-out)Build faster agents

Sort: