Stop Answering the Same Question Twice: Interval-Aware Caching for Druid at Netflix Scale

Netflix built an interval-aware caching layer for Apache Druid to handle the massive repetitive query load generated by real-time dashboards during high-profile live events. The core insight is that most data in a rolling time window is already settled, so only the newest portion needs to be fetched from Druid. The system uses a two-level cache keyed by query hash (without time interval) and bucketed timestamps, with exponential TTLs that increase with data age. On a partial cache hit, only the missing tail is queried from Druid. Backed by Netflix's KVDAL (Cassandra), the cache achieves 82% partial hit rates, serves 84% of result data from cache, reduces Druid query volume by ~33%, and improves P90 query latency by ~66%. The approach is generalizable to any time-series database with overlapping rolling-window queries, and Netflix hopes to contribute it as a native Druid feature.

#data-science

#real-time-analytics

#apache-cassandra

Apr 07•11m read time•From netflixtechblog.com

Table of contents

The Problem The Insight A Deliberate Trade-Off Exponential TTLs Bucketing Get Netflix Technology Blog ’s stories in your inbox How It Works Negative Caching The Storage Layer Results Looking Ahead Summary

Comment

Bookmark

Copy

Sort: