Best of Apache FlinkOctober 2025

  1. 1
    Article
    Avatar of netflixNetflix TechBlog·31w

    How and Why Netflix Built a Real-Time Distributed Graph: Part 1 — Ingesting and Processing Data Streams at Internet Scale

    Netflix built a Real-Time Distributed Graph (RDG) to analyze member interactions across different business verticals like streaming, gaming, and live events. The system processes over 1 million Kafka messages per second using Apache Flink jobs that transform events into graph nodes and edges, writing more than 5 million records per second to storage. The architecture evolved from a monolithic Flink job to a 1:1 mapping between Kafka topics and Flink jobs for better operational stability and tuning. This first part covers the ingestion and processing pipeline, with future posts planned for storage and serving layers.

  2. 2
    Article
    Avatar of bytebytegoByteByteGo·32w

    How OpenAI Uses Kubernetes And Apache Kafka for GenAI

    OpenAI built a stream processing platform using Apache Flink (PyFlink) on Kubernetes to handle real-time data for AI model training and experimentation. The architecture addresses three key challenges: providing Python-first APIs for ML practitioners, handling cloud capacity constraints, and managing multi-primary Kafka clusters. The system features a control plane for multi-cluster failover, per-namespace isolation in Kubernetes, watchdog services for Kafka topology monitoring, and decoupled state management using RocksDB with highly available blob storage. Custom Kafka connectors enable reading from multiple primary clusters simultaneously while maintaining resilience during outages.

  3. 3
    Article
    Avatar of tinybirdTinybird·30w

    Flink is a 95% problem

    Apache Flink is marketed as essential for real-time data processing, but it's overkill for 95% of use cases. Most real-time problems can be solved with simpler solutions: HTTP services with Postgres (65%), OLAP databases like ClickHouse (25%), or custom solutions (5%). Only about 5% of companies actually need Flink's complexity. The platform introduces massive operational overhead including new APIs to learn, additional infrastructure (Kafka, ZooKeeper/K8s), 700+ configuration parameters, complex observability requirements, and JVM dependency. Even Flink's creators acknowledge its limitations, and recent acquisitions of Flink-based companies suggest limited market traction. For most organizations under 100 developers, simpler alternatives like ClickHouse with SQL or native programming language Kafka consumers provide better cost-benefit tradeoffs without the engineering complexity.