Best of ByteByteGo — January 2026

1
Article
ByteByteGo·14w
How Uber Serves over 150 Million Reads per Second from Integrated Cache
Uber's CacheFront system serves over 150 million database reads per second using Redis while maintaining strong consistency. The system uses a three-layer architecture with Query Engine, Storage Engine, and integrated caching. Initial challenges included cache invalidation delays and stale data from conditional updates. Uber solved this by implementing soft deletes, monotonic timestamps, and synchronous write-path invalidation alongside asynchronous CDC (Flux) and TTL expiration. This triple-defense strategy achieves 99.9%+ cache hit rates with near-zero stale values, even with 24-hour TTLs.
214
4
2
Article
ByteByteGo·12w
How Google Manages Trillions of Authorizations with Zanzibar
Zanzibar is Google's global authorization system that handles over 10 million permission checks per second across services like Drive, YouTube, and Maps. It uses a tuple-based data model to represent permissions, employs zookies (tokens) with Google Spanner's TrueTime for consistency guarantees, and runs on 10,000+ servers across 30+ geographic locations. The system achieves 99.999% availability through distributed caching, request deduplication, and client isolation, with 99% of checks served in 3ms median latency. Key architectural decisions include flexible relation tuples, causality-respecting consistency protocols, and optimized serving layers with intelligent caching strategies.
94
2
3
Article
ByteByteGo·13w
How Netflix Built a Real-Time Distributed Graph for Internet Scale
Netflix built a Real-Time Distributed Graph (RDG) to track member interactions across streaming, gaming, and other services. The system processes millions of events per second using Apache Kafka for ingestion, Apache Flink for stream processing, and a custom Key-Value Data Abstraction Layer (KVDAL) built on Cassandra for storage. Netflix rejected traditional graph databases like Neo4j and AWS Neptune due to scalability limitations and operational complexity, instead emulating graph capabilities using key-value storage with adjacency lists. The architecture handles 8 billion nodes, 150 billion edges, 2 million reads/second, and 6 million writes/second across 2,400 EC2 instances, with each node and edge type isolated in separate namespaces for independent scaling.
88
4
Article
ByteByteGo·14w
How Lyft Built an ML Platform That Serves Millions of Predictions Per Second
Lyft built LyftLearn Serving, an ML platform handling millions of predictions per second using a microservices architecture. Instead of a shared monolithic system, they generate independent microservices for each team via configuration templates. The platform separates data plane concerns (runtime performance, inference execution) from control plane concerns (deployment, versioning, testing). Key features include automated model self-tests, flexible library support (TensorFlow, PyTorch), and dual interfaces for engineers and data scientists. The architecture uses Flask/Gunicorn for HTTP serving, Kubernetes for orchestration, and Envoy for load balancing. Over 40 teams migrated from the legacy system, achieving team autonomy while maintaining platform consistency.
53
1

See all ByteByteGo archives