Best of Apache Kafka2025

  1. 1
    Article
    Avatar of dataengineeringData Engineering·42w

    Data Engineer Project: From Streaming Orders to Batch Insights — A Coffee Shop Chain’s Data Pipeline

    A comprehensive data engineering project demonstrates building a complete pipeline for a coffee shop chain that processes real-time orders and provides instant product recommendations while supporting batch analytics. The implementation uses modern tools including Kafka for streaming, Spark for processing, Airflow for orchestration, Delta Lake for storage, Redis for caching, and MinIO for object storage. The project showcases Lakehouse architecture, data quality validation, and SCD Type 2 dimension modeling with full documentation and production-ready simulation.

  2. 2
    Article
    Avatar of confConfluent Blog·34w

    Why Microservices Need Event-Driven Architectures for Agility and Scale

    Event-driven architectures solve critical problems in microservices by replacing synchronous API calls with asynchronous event communication. This approach eliminates bottlenecks, prevents cascading failures, and enables real-time responsiveness. Apache Kafka serves as the central platform for event streaming, allowing services to publish and subscribe to events independently. The shift from REST-based to event-driven microservices delivers faster innovation cycles, improved system resilience, and better customer experiences across industries like finance, e-commerce, and telecommunications.

  3. 3
    Article
    Avatar of systemdesignnewsSystem Design Newsletter·35w

    How Kafka Works

    Apache Kafka is a distributed, fault-tolerant pub/sub messaging system built on a simple log data structure. It uses brokers for horizontal scaling, partitions for data sharding, and replication for durability. The system employs KRaft consensus for leader election and metadata management. Key features include tiered storage for cost optimization, consumer groups for parallel processing, transactions for exactly-once semantics, and ecosystem components like Kafka Streams for stream processing and Kafka Connect for system integration.

  4. 4
    Article
    Avatar of bytebytegoByteByteGo·33w

    How OpenAI Uses Kubernetes And Apache Kafka for GenAI

    OpenAI built a stream processing platform using Apache Flink (PyFlink) on Kubernetes to handle real-time data for AI model training and experimentation. The architecture addresses three key challenges: providing Python-first APIs for ML practitioners, handling cloud capacity constraints, and managing multi-primary Kafka clusters. The system features a control plane for multi-cluster failover, per-namespace isolation in Kubernetes, watchdog services for Kafka topology monitoring, and decoupled state management using RocksDB with highly available blob storage. Custom Kafka connectors enable reading from multiple primary clusters simultaneously while maintaining resilience during outages.

  5. 5
    Video
    Avatar of techworldwithnanaTechWorld with Nana·34w

    Apache Kafka Complete Course for Beginners

  6. 6
    Article
    Avatar of debeziumDebezium·34w

    Debezium 3.3.0.Final Released

    Debezium 3.3.0.Final introduces major enhancements including a new Quarkus extension for PostgreSQL integration, a CockroachDB connector, Apache Kafka 4.1 support, and exactly-once semantics for all core connectors. The release includes OpenLineage support for MongoDB and JDBC sink connectors, improved performance optimizations across Oracle, PostgreSQL, and MySQL connectors, and enhanced Debezium Platform features like smart editor and connection management. Breaking changes include removal of deprecated snapshot modes and updates to JDBC sink data type precision handling.

  7. 7
    Article
    Avatar of newstackThe New Stack·37w

    Apache Kafka 4.1: The 3 Big Things Developers Need To Know

    Apache Kafka 4.1 introduces three major developer-focused features: Queues for Kafka (KIP-932) enabling cooperative message consumption with per-message acknowledgment, native JWT-Bearer authentication support eliminating static credentials, and a new Kafka Streams rebalance protocol for better coordination. The release also includes improvements to consumer group protocols, transaction handling, and unified metrics.

  8. 8
    Article
    Avatar of newstackThe New Stack·1y

    The New Look and Feel of Apache Kafka 4.0

    Apache Kafka 4.0 introduces significant upgrades, including the replacement of ZooKeeper with KRaft for metadata management, enhancing stability and reducing complexity. The release features Queues for Kafka to allow scaling consumers beyond topic partitions, improved consumer group rebalancing, and new capabilities for code injection and observability. These updates aim to streamline Kafka's operations and improve the developer experience.

  9. 9
    Article
    Avatar of baeldungBaeldung·1y

    Kafka Producer and Consumer Message Acknowledgement Options

    The tutorial explains the acknowledgment options available for producers and consumers in Apache Kafka, detailing how the three producer acknowledgment modes (none, leader, and all) impact message reliability and system performance. It also covers essential consumer configuration properties, such as group ID, auto offset reset, enable auto commit, and auto commit interval, which affect consumer message processing and reliability. Understanding these options allows developers to balance performance and reliability for different use cases.

  10. 10
    Article
    Avatar of newstackThe New Stack·1y

    A2A, MCP, Kafka and Flink: The New Stack for AI Agents

    The post discusses the need for a new infrastructure stack to enable AI agents to collaborate effectively. This stack includes four open components: Google’s Agent2Agent (A2A) protocol for agent communication, Anthropic’s Model Context Protocol (MCP) for tool access, Apache Kafka for event-driven communication, and Apache Flink for real-time data processing. By integrating these technologies, AI agents can operate beyond isolated silos, scaling to complex ecosystems that facilitate collaboration, observability, and resilience.

  11. 11
    Article
    Avatar of swiggySwiggy Bytes·52w

    Enabling Real-Time Business Monitoring with Klaxon

    Swiggy leverages Klaxon, an in-house real-time business monitoring and alerting system, to enhance proactive problem-solving and operational efficiency. Klaxon supports diverse alerting channels, flexible alert types, and personalized notifications, using technologies like Apache Kafka and Snowflake. Recent improvements have streamlined operations and lowered costs, with Klaxon playing a vital role in maintaining smooth operations and improving decision-making across Swiggy's teams.

  12. 12
    Article
    Avatar of collectionsCollections·1y

    Understanding Apache Kafka: Basics and Key Features

    Apache Kafka is a distributed event-streaming platform designed for real-time data processing. It manages data flow efficiently in event-driven systems with components like topics, partitions, producers, consumers, and brokers. Kafka ensures high availability through data replication and a leader-follower model. Its architecture supports data persistence and parallel processing via consumer groups. The recent introduction of Kafka Raft (KRaft) aims to simplify cluster management.

  13. 13
    Article
    Avatar of confConfluent Blog·49w

    Build an AI Personalization Engine with Confluent & Databricks

    Confluent and Databricks can be combined to build real-time AI applications by bridging operational and analytical data systems. The tutorial demonstrates creating an AI-powered marketing personalization engine using Tableflow to convert Kafka topics into Delta Lake tables, Apache Flink for stream processing, and Oracle CDC connectors for real-time data ingestion. The example implementation helps a fictional hotel brand identify low-booking properties and generate targeted promotional campaigns using AI-generated content.