Best of Kafka — December 2024

1
Video
YouTube·1y
Apache Kafka in 15 minutes
Apache Kafka, developed by LinkedIn in 2011, is an open-source messaging system known for its scalability and reliability. Kafka is primarily used for event streaming and message queuing, making it ideal for systems requiring high throughput and fault tolerance. It includes key components like producers, brokers, and consumers, and ensures data consistency with features like partitions, replicas, and the at-least-once delivery guarantee. Kafka also optimizes performance with techniques like zero-copying which enhance message throughput significantly.
60
1
2
Video
ByteByteGo·1y
Apache Kafka Fundamentals You Should Know
Apache Kafka is a distributed event store and real-time streaming platform, originally developed at LinkedIn, that powers many large data pipelines. Kafka organizes data into messages, which are further categorized into topics and divided into partitions for scalability. It efficiently handles multiple producers and consumer groups, allowing for high-throughput, fault tolerance, and scalable data processing. Kafka's retention policies ensure data persistence, and its applications include log aggregation, real-time event streaming, change data capture, and system monitoring across various industries.
60
3
Article
ByteByteGo·1y
How LinkedIn Customizes Its 7 Trillion Message Kafka Ecosystem
LinkedIn utilizes Apache Kafka to handle over 7 trillion messages daily, managing this massive scale with over 100 Kafka clusters and more than 4,000 servers. Its Kafka infrastructure includes custom features and enhancements for scalability and operability, tailored through specialized LinkedIn Kafka release branches. LinkedIn maintains unique patches and contributions to the open-source project, ensuring optimal performance and resource utilization for their specific needs, while also sharing improvements with the community.
43
1
4
Article
Last9·1y
Kafka with OpenTelemetry: Distributed Tracing Guide
Integrating Apache Kafka with OpenTelemetry enhances system observability and performance by enabling end-to-end distributed tracing and capturing essential metrics like message throughput and consumer lag. This integration helps track how messages flow through Kafka, identify bottlenecks, improve error detection, and optimize performance, particularly in cloud-native and microservices architectures.
26
5
Article
Metadata·1y
Stream Processing
Batch processes can delay business operations, so stream processing is used to handle events immediately as they occur. Stream processing involves systems notifying consumers of new events, often through message brokers like RabbitMQ or log-based brokers like Kafka. Dual writes can lead to errors and inconsistencies, so Change Data Capture (CDC) allows for consistent data replication across systems. Event sourcing records all changes immutably, aiding in auditability, recovery, and analytics. Stream processing can be used in various applications, including fraud detection, trading systems, and manufacturing, and relies on techniques like microbatching and checkpointing for fault tolerance.
24

See all Kafka archives