Best of Kafka — September 2024

1
Article
Towards Dev·2y
Kafka 101: A Beginner’s Guide to Understanding Kafka
Apache Kafka is a distributed streaming platform used for building real-time data pipelines and applications. It was developed to handle high throughput, scalability, and durability needs, making it popular in industries such as finance, retail, and telecom. Kafka operates on key concepts like topics, partitions, producers, consumers, brokers, offsets, and consumer groups, which together provide a robust system for event-driven architectures and stream processing. Recent developments include the transition from ZooKeeper to KRaft for improved scalability and simplified operations. Kafka supports delivery semantics like At-Most-Once, At-Least-Once, and Exactly-Once, ensuring reliable message delivery across various use cases.
278
3
2
Article
System Design Codex·2y
Introduction to Kafka
Kafka is a distributed event store and streaming platform initially developed by LinkedIn and now widely used by companies like Netflix and Uber for data pipelines. It is favored for its reliability and scalability. Kafka messages are written in batches to enhance efficiency, and these messages are categorized into topics and partitions. Producers send messages to Kafka brokers, while consumers read these messages. Kafka brokers usually function within a cluster, allowing for message replication and redundancy. Despite its benefits, Kafka has several complexities, including a plethora of configuration options and underdeveloped client libraries outside Java and C.
184
2
3
Video
YouTube·2y
When to Use Kafka or RabbitMQ | System Design
Kafka and RabbitMQ serve different purposes in distributed systems. Kafka is designed for high-throughput stream processing, fanning out messages to multiple consumers, and handling uniform, short processing tasks. RabbitMQ is a traditional message queue system better suited for complex routing, long-running tasks, and handling sporadic data flow with acknowledgments for message processing. Choose Kafka for scenarios requiring high-speed, real-time data distribution and RabbitMQ for more controlled message queuing and processing.
102
1
4
Article
Community Picks·2y
Building Microservices Architecture with CQRS Pattern Using Kafka and NestJS: A Step-by-Step Guide
This guide explains how to build a sample blog application using microservices architecture with CQRS pattern, Kafka, and NestJS. It covers setting up the project structure, configuring Kafka with Docker Compose, and developing microservices for user authentication, blog article creation, and event sourcing for reliable communication between services.
63
1
5
Article
Trendyol Tech·2y
Optimizing Kafka Performance Through Data Compression
Data compression in Kafka is essential for improving system efficiency and performance by reducing message size, which lowers network and storage needs and enhances disk I/O. The study benchmarks various algorithms like Gzip, Zstd, Lz4, and Snappy, highlighting their trade-offs in terms of compression ratio, speed, and resource consumption. Zstd at level 3 was found to be the most optimal for balancing compression efficiency and resource usage. Implementing the right compression strategy can significantly optimize Kafka's handling of large datasets, reduce costs, and maintain high performance under heavy loads.
34
6
Article
Towards Dev·2y
Transmitting Large Kafka Payloads: Best Practices and Strategies
Transmitting large payloads in Apache Kafka can be challenging due to its default 1 MB message size limit. To handle larger messages efficiently, you can increase the message size limits, use compression codecs like LZ4, optimize batching with settings such as `linger.ms` and `batch.size`, split messages into smaller chunks, or offload large data to external stores while using Kafka for metadata. These strategies help maintain high throughput and low latency without straining Kafka's resources.
15
3
7
Article
Stack Overflow Blog·2y
Best practices for cost-efficient Kafka clusters
Kafka facilitates real-time data processing across distributed systems, but managing costs while maintaining performance requires careful planning. Key cost drivers include computing infrastructure, data transfer, and storage. Different deployment types (serverless, hosted, and self-hosted) impact costs uniquely. Cost-efficiency involves continuous optimization, such as removing inactive resources, enabling client-level compression, avoiding default settings, and adopting dynamic sizing to match workloads. Following these best practices can help keep Kafka clusters cost-efficient.
15

See all Kafka archives