Best of Distributed SystemsMarch 2026

  1. 1
    Article
    Avatar of muratbuffaloMetadata·11w

    Building a Database on S3

    A review of a 2008 research paper that proposed building a relational database on Amazon S3, using SQS as a write-ahead log and S3 as a page store. The design pioneered the storage-compute separation philosophy now central to cloud-native databases like Aurora, Snowflake, and Delta Lake. Key challenges included SQS's non-FIFO delivery requiring idempotent log records, an atomicity protocol for all-or-nothing commits, B-link trees for lock-free reads on stale S3 pages, and weak isolation guarantees where last-writer-wins replaces traditional ANSI SQL isolation. The paper's claim that snapshot isolation requires a centralized counter is noted as outdated given modern hybrid logical clocks. Despite its clunky 2008-era protocols, the paper is credited as a conceptual precursor to modern data lake and lakehouse architectures.

  2. 2
    Article
    Avatar of joindevopsDevOps·11w

    Our AWS Bill Spiked 3x Overnight — It Wasn’t Traffic, It Was One Missing Limit

    A DevOps engineer shares a postmortem of an AWS bill tripling overnight due to a runaway Auto Scaling Group. A background SQS worker with no concurrency ceiling triggered a self-amplifying feedback loop: slow jobs raised CPU, which launched more EC2 instances, which pulled more messages, which caused more retries. The fix involved setting a hard max on the ASG in Terraform, capping application-level concurrency with p-limit, adding circuit breakers for downstream failures, and setting up AWS Budget alerts and cost anomaly detection.

  3. 3
    Article
    Avatar of systemdesigncodexSystem Design Codex·10w

    How Agoda Load Balanced Kafka

    Agoda processes hundreds of terabytes of Kafka data daily for real-time price updates from suppliers. Standard round-robin partitioning caused over-provisioning due to heterogeneous hardware and uneven message workloads. Static solutions like identical pod deployments and weighted load balancing were rejected as impractical. Instead, Agoda built a dynamic lag-aware system with two components: a lag-aware producer that routes fewer messages to high-lag partitions using Same-Queue Length and Outlier Detection algorithms, and lag-aware consumers that proactively unsubscribe to trigger rebalancing when experiencing high lag, leveraging Kafka 2.4's incremental cooperative rebalance protocol.

  4. 4
    Article
    Avatar of debeziumDebezium·11w

    Hello Debezium Team!

    Vincenzo Santonastaso introduces himself as a new core contributor to the Debezium open source project. He shares his background as a Senior Product Engineer at lastminute.com working on distributed systems for flight booking, and prior experience at BMC Software with time-series data and forecasting. His interests center on distributed systems and event-driven architectures, and he expresses enthusiasm for contributing more deeply to Debezium.

  5. 5
    Article
    Avatar of bytebytegoByteByteGo·8w

    How Netflix Live Streams to 100 Million Devices in 60 Seconds

    Netflix's Live Origin is a custom-built server bridging cloud live streaming pipelines and the Open Connect CDN. Key architectural decisions include dual redundant regional pipelines for fault tolerance, predictable 2-second segment templates, and intelligent segment selection that picks the best candidate from either pipeline. To optimize CDN performance, Netflix extended nginx with millisecond-grain caching, implemented request holding at the live edge, and uses custom HTTP headers to propagate streaming metadata to millions of devices. Storage evolved from AWS S3 to a Cassandra-backed key-value store with EVCache write-through caching, achieving median latency of 25ms and supporting 200+ Gbps read throughput. The system uses strict publishing isolation, priority-based rate limiting (live edge over DVR), and hierarchical metadata caching to handle 404 storms and traffic surges. During the 2024 Tyson vs. Paul fight, Netflix handled 65 million concurrent streams.

  6. 6
    Article
    Avatar of bytebytegoByteByteGo·9w

    How Reddit Migrated Petabyte-Scale Kafka from EC2 to Kubernetes

    Reddit's engineering team migrated its entire Apache Kafka fleet — over 500 brokers and more than a petabyte of live data — from Amazon EC2 to Kubernetes using Strimzi, with zero downtime and no client-side changes. The migration was executed in six phases: introducing a DNS abstraction layer to decouple clients from broker addresses, freeing up broker ID space by reshuffling EC2 brokers, running a mixed EC2/Kubernetes cluster via a forked Strimzi operator, gradually shifting partition leadership and data using Cruise Control, migrating the control plane from ZooKeeper to KRaft, and finally handing off to the standard Strimzi operator. Key lessons include using abstraction layers to decouple clients from infrastructure, treating logical state as the primary asset to protect, and designing every migration step to be reversible.