Best of NetflixDecember 2025

  1. 1
    Article
    Avatar of bytebytegoByteByteGo·20w

    How Netflix Built a Distributed Write Ahead Log For Its Data Platform

    Netflix built a distributed Write-Ahead Log (WAL) system to solve data reliability issues across their platform. The WAL captures every data change before applying it to databases, enabling automatic retries, cross-region replication, and multi-partition consistency. Built on top of their Data Gateway Infrastructure, it uses Kafka and Amazon SQS as pluggable backends, supports multiple use cases through namespaces, and scales independently through sharded deployments. The system provides durability guarantees while allowing teams to configure retry logic, delays, and targets without code changes.

  2. 2
    Article
    Avatar of baeldungBaeldung·19w

    Introduction to Netflix Hollow

    Netflix Hollow is a low-latency Java framework for distributing data from a source to multiple targets using a producer-consumer model. The producer fetches data from external systems and publishes snapshots to file systems or object storage, while consumers read and process these snapshots. The framework efficiently manages memory by offloading large datasets to external storage, addressing Java heap space issues. Implementation involves defining entity classes with primary keys, setting up publishers and announcers for producers, generating consumer APIs using HollowAPIGenerator, and configuring announcement watchers and retrievers for consumers. The library handles snapshot versioning, updates, and notifications automatically.

  3. 3
    Article
    Avatar of netflixNetflix TechBlog·18w

    How Temporal Powers Reliable Cloud Operations at Netflix

    Netflix reduced transient deployment failures from 4% to 0.0001% by migrating cloud operation orchestration from Spinnaker's homegrown system to Temporal's durable execution platform. The original Clouddriver service suffered from complex internal orchestration, instance-local state, and unreliable retry logic. By implementing cloud operations as Temporal workflows with activities, Netflix eliminated tight coupling between services, removed thousands of lines of custom orchestration code, and gained automatic retries, state persistence, and better observability. The migration used abstraction layers and dynamic configuration to transparently onboard all applications within two quarters.