Best of Apache Flink — October 2025

1
Article
Netflix TechBlog·32w
How and Why Netflix Built a Real-Time Distributed Graph: Part 1 — Ingesting and Processing Data Streams at Internet Scale
Netflix built a Real-Time Distributed Graph (RDG) to analyze member interactions across different business verticals like streaming, gaming, and live events. The system processes over 1 million Kafka messages per second using Apache Flink jobs that transform events into graph nodes and edges, writing more than 5 million records per second to storage. The architecture evolved from a monolithic Flink job to a 1:1 mapping between Kafka topics and Flink jobs for better operational stability and tuning. This first part covers the ingestion and processing pipeline, with future posts planned for storage and serving layers.
82
2
Article
ByteByteGo·34w
How OpenAI Uses Kubernetes And Apache Kafka for GenAI
OpenAI built a stream processing platform using Apache Flink (PyFlink) on Kubernetes to handle real-time data for AI model training and experimentation. The architecture addresses three key challenges: providing Python-first APIs for ML practitioners, handling cloud capacity constraints, and managing multi-primary Kafka clusters. The system features a control plane for multi-cluster failover, per-namespace isolation in Kubernetes, watchdog services for Kafka topology monitoring, and decoupled state management using RocksDB with highly available blob storage. Custom Kafka connectors enable reading from multiple primary clusters simultaneously while maintaining resilience during outages.
56
3
Article
Tinybird·32w
Flink is a 95% problem
Apache Flink is marketed as essential for real-time data processing, but it's overkill for 95% of use cases. Most real-time problems can be solved with simpler solutions: HTTP services with Postgres (65%), OLAP databases like ClickHouse (25%), or custom solutions (5%). Only about 5% of companies actually need Flink's complexity. The platform introduces massive operational overhead including new APIs to learn, additional infrastructure (Kafka, ZooKeeper/K8s), 700+ configuration parameters, complex observability requirements, and JVM dependency. Even Flink's creators acknowledge its limitations, and recent acquisitions of Flink-based companies suggest limited market traction. For most organizations under 100 developers, simpler alternatives like ClickHouse with SQL or native programming language Kafka consumers provide better cost-benefit tradeoffs without the engineering complexity.
27
1

See all Apache Flink archives