Optimize your AI Agents on Heroku. New automatic prompt caching reuses processed system prompts to deliver significantly faster responses. No config needed.

Heroku offers insights into cloud application development, deployment, and management, providing documentation and best practices for deploying and scaling applications on the Heroku platform. By exploring Heroku's curated content, developers can learn about Heroku's containerized deployment model, continuous delivery workflows, and add-on ecosystem for extending application functionality. Whether you're deploying web applications, APIs, or background workers, Heroku offers resources to streamline your development workflow and focus on building great software.

Heroku

Heroku is introducing automatic prompt caching for its Managed Inference and Agents service, launching December 18, 2025. The feature speeds up AI inference by caching and reusing processed system prompts and tool definitions, with caches expiring after five minutes of inactivity. Currently enabled by default at no additional cost, it supports multiple models including Claude Sonnet 4.5, Claude Haiku variants, and Amazon Nova models, each with specific token thresholds. Developers can opt out of caching for sensitive workflows using the X-Heroku-Prompt-Caching header. The implementation focuses on system prompts and tool definitions while excluding user messages and conversation history for security.

Faster Agents with Automatic Prompt Caching

What is prompt caching and how does it speed up AI inference?

How to disable prompt caching on sensitive workflows (opt-out)