Best of Apache Kafka — 2025

1
Article
Data Engineering·42w
Data Engineer Project: From Streaming Orders to Batch Insights — A Coffee Shop Chain’s Data Pipeline
A comprehensive data engineering project demonstrates building a complete pipeline for a coffee shop chain that processes real-time orders and provides instant product recommendations while supporting batch analytics. The implementation uses modern tools including Kafka for streaming, Spark for processing, Airflow for orchestration, Delta Lake for storage, Redis for caching, and MinIO for object storage. The project showcases Lakehouse architecture, data quality validation, and SCD Type 2 dimension modeling with full documentation and production-ready simulation.
97
2
2
Article
Confluent Blog·34w
Why Microservices Need Event-Driven Architectures for Agility and Scale
Event-driven architectures solve critical problems in microservices by replacing synchronous API calls with asynchronous event communication. This approach eliminates bottlenecks, prevents cascading failures, and enables real-time responsiveness. Apache Kafka serves as the central platform for event streaming, allowing services to publish and subscribe to events independently. The shift from REST-based to event-driven microservices delivers faster innovation cycles, improved system resilience, and better customer experiences across industries like finance, e-commerce, and telecommunications.
65
3
Article
System Design Newsletter·35w
How Kafka Works
Apache Kafka is a distributed, fault-tolerant pub/sub messaging system built on a simple log data structure. It uses brokers for horizontal scaling, partitions for data sharding, and replication for durability. The system employs KRaft consensus for leader election and metadata management. Key features include tiered storage for cost optimization, consumer groups for parallel processing, transactions for exactly-once semantics, and ecosystem components like Kafka Streams for stream processing and Kafka Connect for system integration.
60
1
4
Article
ByteByteGo·33w
How OpenAI Uses Kubernetes And Apache Kafka for GenAI
OpenAI built a stream processing platform using Apache Flink (PyFlink) on Kubernetes to handle real-time data for AI model training and experimentation. The architecture addresses three key challenges: providing Python-first APIs for ML practitioners, handling cloud capacity constraints, and managing multi-primary Kafka clusters. The system features a control plane for multi-cluster failover, per-namespace isolation in Kubernetes, watchdog services for Kafka topology monitoring, and decoupled state management using RocksDB with highly available blob storage. Custom Kafka connectors enable reading from multiple primary clusters simultaneously while maintaining resilience during outages.
56
5
Video
TechWorld with Nana·34w
Apache Kafka Complete Course for Beginners
43
6
Article
Debezium·34w
Debezium 3.3.0.Final Released
Debezium 3.3.0.Final introduces major enhancements including a new Quarkus extension for PostgreSQL integration, a CockroachDB connector, Apache Kafka 4.1 support, and exactly-once semantics for all core connectors. The release includes OpenLineage support for MongoDB and JDBC sink connectors, improved performance optimizations across Oracle, PostgreSQL, and MySQL connectors, and enhanced Debezium Platform features like smart editor and connection management. Breaking changes include removal of deprecated snapshot modes and updates to JDBC sink data type precision handling.
33
7
Article
The New Stack·37w
Apache Kafka 4.1: The 3 Big Things Developers Need To Know
Apache Kafka 4.1 introduces three major developer-focused features: Queues for Kafka (KIP-932) enabling cooperative message consumption with per-message acknowledgment, native JWT-Bearer authentication support eliminating static credentials, and a new Kafka Streams rebalance protocol for better coordination. The release also includes improvements to consumer group protocols, transaction handling, and unified metrics.
32
1
8
Article
The New Stack·1y
The New Look and Feel of Apache Kafka 4.0
Apache Kafka 4.0 introduces significant upgrades, including the replacement of ZooKeeper with KRaft for metadata management, enhancing stability and reducing complexity. The release features Queues for Kafka to allow scaling consumers beyond topic partitions, improved consumer group rebalancing, and new capabilities for code injection and observability. These updates aim to streamline Kafka's operations and improve the developer experience.
32
9
Article
Baeldung·1y
Kafka Producer and Consumer Message Acknowledgement Options
The tutorial explains the acknowledgment options available for producers and consumers in Apache Kafka, detailing how the three producer acknowledgment modes (none, leader, and all) impact message reliability and system performance. It also covers essential consumer configuration properties, such as group ID, auto offset reset, enable auto commit, and auto commit interval, which affect consumer message processing and reliability. Understanding these options allows developers to balance performance and reliability for different use cases.
27
10
Article
The New Stack·1y
A2A, MCP, Kafka and Flink: The New Stack for AI Agents
The post discusses the need for a new infrastructure stack to enable AI agents to collaborate effectively. This stack includes four open components: Google’s Agent2Agent (A2A) protocol for agent communication, Anthropic’s Model Context Protocol (MCP) for tool access, Apache Kafka for event-driven communication, and Apache Flink for real-time data processing. By integrating these technologies, AI agents can operate beyond isolated silos, scaling to complex ecosystems that facilitate collaboration, observability, and resilience.
23
1
11
Article
Swiggy Bytes·52w
Enabling Real-Time Business Monitoring with Klaxon
Swiggy leverages Klaxon, an in-house real-time business monitoring and alerting system, to enhance proactive problem-solving and operational efficiency. Klaxon supports diverse alerting channels, flexible alert types, and personalized notifications, using technologies like Apache Kafka and Snowflake. Recent improvements have streamlined operations and lowered costs, with Klaxon playing a vital role in maintaining smooth operations and improving decision-making across Swiggy's teams.
19
12
Article
Collections·1y
Understanding Apache Kafka: Basics and Key Features
Apache Kafka is a distributed event-streaming platform designed for real-time data processing. It manages data flow efficiently in event-driven systems with components like topics, partitions, producers, consumers, and brokers. Kafka ensures high availability through data replication and a leader-follower model. Its architecture supports data persistence and parallel processing via consumer groups. The recent introduction of Kafka Raft (KRaft) aims to simplify cluster management.
13
13
Article
Confluent Blog·49w
Build an AI Personalization Engine with Confluent & Databricks
Confluent and Databricks can be combined to build real-time AI applications by bridging operational and analytical data systems. The tutorial demonstrates creating an AI-powered marketing personalization engine using Tableflow to convert Kafka topics into Delta Lake tables, Apache Flink for stream processing, and Oracle CDC connectors for real-time data ingestion. The example implementation helps a fictional hotel brand identify low-booking properties and generate targeted promotional campaigns using AI-generated content.
10
2

See all Apache Kafka archives