Netflix built an interval-aware caching layer for Apache Druid to handle the massive repetitive query load generated by real-time dashboards during high-profile live events. The core insight is that most data in a rolling time window is already settled, so only the newest portion needs to be fetched from Druid. The system uses a two-level cache keyed by query hash (without time interval) and bucketed timestamps, with exponential TTLs that increase with data age. On a partial cache hit, only the missing tail is queried from Druid. Backed by Netflix's KVDAL (Cassandra), the cache achieves 82% partial hit rates, serves 84% of result data from cache, reduces Druid query volume by ~33%, and improves P90 query latency by ~66%. The approach is generalizable to any time-series database with overlapping rolling-window queries, and Netflix hopes to contribute it as a native Druid feature.

11m read timeFrom netflixtechblog.com
Post cover image
Table of contents
The ProblemThe InsightA Deliberate Trade-OffExponential TTLsBucketingGet Netflix Technology Blog ’s stories in your inboxHow It WorksNegative CachingThe Storage LayerResultsLooking AheadSummary

Sort: