Netflix built a Real-Time Distributed Graph (RDG) to analyze member interactions across different business verticals like streaming, gaming, and live events. The system processes over 1 million Kafka messages per second using Apache Flink jobs that transform events into graph nodes and edges, writing more than 5 million records per second to storage. The architecture evolved from a monolithic Flink job to a 1:1 mapping between Kafka topics and Flink jobs for better operational stability and tuning. This first part covers the ingestion and processing pipeline, with future posts planned for storage and serving layers.
Table of contents
IntroductionIngestion and ProcessingKafka as the Ingestion BackboneGet Netflix Technology Blog’s stories in your inboxProcessing Data with Apache FlinkFrom One Job to Many: Scaling Flink the Hard WayAcknowledgementsSort: