Amazon Bedrock now provides two new CloudWatch metrics for improved inference observability. TimeToFirstToken measures streaming API latency from request to first token received, enabling SLA monitoring and latency alarms without client-side instrumentation. EstimatedTPMQuotaUsage tracks Tokens Per Minute quota consumption including cache write tokens and output burndown multipliers across all inference APIs, allowing proactive quota management before rate limiting occurs. Both metrics are available in all commercial Bedrock regions, updated every minute, with no opt-in or API changes required.

2m read timeFrom aws.amazon.com
Post cover image

Sort: