When “One in a Billion” Happens Every Day: Scaling Redis at Report URI

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

Report URI processes telemetry at such scale that one-in-a-billion edge cases occur daily. The post details several Redis optimizations made to handle this load: increasing the replication backlog from 64MB to 2GB to prevent full resyncs during failovers, switching from phpredis connect() to pconnect() to reuse persistent connections and cut connection overhead by ~99.97%, replacing blocking hGetAll() calls with incremental hScan() to avoid stalling Redis's single-threaded command loop, and upgrading primary/replica servers to 16GB RAM with 4 vCPUs. The HA Redis Sentinel setup enabled zero-downtime upgrades via controlled failovers. Future considerations include read replicas, though write-heavy volatile data and atomic lock requirements make that complex.

14m read timeFrom scotthelme.ghost.io
Post cover image

Sort: