Best of Distributed Systems — March 2026

1
Article
Metadata·11w
Building a Database on S3
A review of a 2008 research paper that proposed building a relational database on Amazon S3, using SQS as a write-ahead log and S3 as a page store. The design pioneered the storage-compute separation philosophy now central to cloud-native databases like Aurora, Snowflake, and Delta Lake. Key challenges included SQS's non-FIFO delivery requiring idempotent log records, an atomicity protocol for all-or-nothing commits, B-link trees for lock-free reads on stale S3 pages, and weak isolation guarantees where last-writer-wins replaces traditional ANSI SQL isolation. The paper's claim that snapshot isolation requires a centralized counter is noted as outdated given modern hybrid logical clocks. Despite its clunky 2008-era protocols, the paper is credited as a conceptual precursor to modern data lake and lakehouse architectures.
48
2
Article
DevOps·11w
Our AWS Bill Spiked 3x Overnight — It Wasn’t Traffic, It Was One Missing Limit
A DevOps engineer shares a postmortem of an AWS bill tripling overnight due to a runaway Auto Scaling Group. A background SQS worker with no concurrency ceiling triggered a self-amplifying feedback loop: slow jobs raised CPU, which launched more EC2 instances, which pulled more messages, which caused more retries. The fix involved setting a hard max on the ASG in Terraform, capping application-level concurrency with p-limit, adding circuit breakers for downstream failures, and setting up AWS Budget alerts and cost anomaly detection.
31
6
3
Article
System Design Codex·10w
How Agoda Load Balanced Kafka
Agoda processes hundreds of terabytes of Kafka data daily for real-time price updates from suppliers. Standard round-robin partitioning caused over-provisioning due to heterogeneous hardware and uneven message workloads. Static solutions like identical pod deployments and weighted load balancing were rejected as impractical. Instead, Agoda built a dynamic lag-aware system with two components: a lag-aware producer that routes fewer messages to high-lag partitions using Same-Queue Length and Outlier Detection algorithms, and lag-aware consumers that proactively unsubscribe to trigger rebalancing when experiencing high lag, leveraging Kafka 2.4's incremental cooperative rebalance protocol.
29
4
Article
Debezium·11w
Hello Debezium Team!
Vincenzo Santonastaso introduces himself as a new core contributor to the Debezium open source project. He shares his background as a Senior Product Engineer at lastminute.com working on distributed systems for flight booking, and prior experience at BMC Software with time-series data and forecasting. His interests center on distributed systems and event-driven architectures, and he expresses enthusiasm for contributing more deeply to Debezium.
17
5
Article
ByteByteGo·8w
How Netflix Live Streams to 100 Million Devices in 60 Seconds
Netflix's Live Origin is a custom-built server bridging cloud live streaming pipelines and the Open Connect CDN. Key architectural decisions include dual redundant regional pipelines for fault tolerance, predictable 2-second segment templates, and intelligent segment selection that picks the best candidate from either pipeline. To optimize CDN performance, Netflix extended nginx with millisecond-grain caching, implemented request holding at the live edge, and uses custom HTTP headers to propagate streaming metadata to millions of devices. Storage evolved from AWS S3 to a Cassandra-backed key-value store with EVCache write-through caching, achieving median latency of 25ms and supporting 200+ Gbps read throughput. The system uses strict publishing isolation, priority-based rate limiting (live edge over DVR), and hierarchical metadata caching to handle 404 storms and traffic surges. During the 2024 Tyson vs. Paul fight, Netflix handled 65 million concurrent streams.
17
1
6
Article
ByteByteGo·9w
How Reddit Migrated Petabyte-Scale Kafka from EC2 to Kubernetes
Reddit's engineering team migrated its entire Apache Kafka fleet — over 500 brokers and more than a petabyte of live data — from Amazon EC2 to Kubernetes using Strimzi, with zero downtime and no client-side changes. The migration was executed in six phases: introducing a DNS abstraction layer to decouple clients from broker addresses, freeing up broker ID space by reshuffling EC2 brokers, running a mixed EC2/Kubernetes cluster via a forked Strimzi operator, gradually shifting partition leadership and data using Cruise Control, migrating the control plane from ZooKeeper to KRaft, and finally handing off to the standard Strimzi operator. Key lessons include using abstraction layers to decouple clients from infrastructure, treating logical state as the primary asset to protect, and designing every migration step to be reversible.
10

See all Distributed Systems archives