Datadog's Metrics Summary page suffered 7-second p90 latency due to expensive joins on 82K metrics against 817K configurations in Postgres. The root cause was using a transactional database for search workloads. The solution was Change Data Capture (CDC) using Debezium to stream Postgres WAL changes into Kafka, then into a
Table of contents
Your cache isn’t the problem. How you’re using it is. (Sponsored)The Database Was Simply Doing the Wrong JobWhy Async?The Problem With Schema EvolutionFrom One Pipeline to a PlatformConclusionSort: