Best of Kafka — 2024

1
Article
gitconnected·2y
Message Queues in System Design
Message queues are durable components that support asynchronous communication, helping to decouple events and handle tasks without immediate processing. This allows better scalability and durability, especially under high traffic. Different types of queues like FIFO and priority queues, as well as different models like push-based and pull-based queues, provide versatile solutions for various needs. Examples of message queues include RabbitMQ for versatility, Kafka for high throughput, and Amazon SQS for managed cloud-based services.
1.3K
29
2
Article
Baeldung·2y
Apache Kafka Tutorial Series
This tutorial series covers essential topics about Apache Kafka, including its basics, how to integrate it with Spring Boot, configuring Kafka SSL, and setting up Kafka using Docker. It also explores Kafka consumer groups, retry mechanisms, and message ordering strategies, providing practical guidance for both newcomers and experienced developers.
339
3
Article
Towards Dev·2y
Kafka 101: A Beginner’s Guide to Understanding Kafka
Apache Kafka is a distributed streaming platform used for building real-time data pipelines and applications. It was developed to handle high throughput, scalability, and durability needs, making it popular in industries such as finance, retail, and telecom. Kafka operates on key concepts like topics, partitions, producers, consumers, brokers, offsets, and consumer groups, which together provide a robust system for event-driven architectures and stream processing. Recent developments include the transition from ZooKeeper to KRaft for improved scalability and simplified operations. Kafka supports delivery semantics like At-Most-Once, At-Least-Once, and Exactly-Once, ensuring reliable message delivery across various use cases.
278
3
4
Article
ByteByteGo·2y
EP126: The Ultimate Kafka 101 You Cannot Miss
This edition of the ByteByteGo newsletter covers several key topics, including a guide to understanding Apache Kafka, tips for efficient API design, an overview of AWS Services, and an advertisement for QA Wolf, an automated testing solution. Kafka is detailed with its core concepts like messages, topics, partitions, producers, consumers, clusters, and use cases. The AWS Services cheat sheet simplifies the exploration of AWS's expansive offerings. Additionally, the newsletter includes 8 practical tips for better API design.
193
5
Article
Hacker News·2y
The Architecture Behind A One-Person Tech Startup
The post discusses the architecture and tools used in a one-person tech startup, including Kubernetes on AWS, automatic DNS and SSL setup, load balancing, automated rollouts and rollbacks, horizontal autoscaling, caching, app administration, scheduled jobs, loggin and monitoring, and more.
193
3
6
Article
ByteByteGo·2y
The Trillion Message Kafka Setup at Walmart
Walmart's Apache Kafka setup processes trillions of messages daily with a 99.99% availability rate, supporting critical data movement, event-driven microservices, and streaming analytics. The team addressed challenges like consumer rebalancing, poison pill messages, and costs by designing a Message Proxy Service (MPS). This service decouples Kafka message consumption from its partition-based model, allowing consumer applications to scale independently and handling consumer failures effectively.
190
2
7
Article
System Design Codex·2y
Introduction to Kafka
Kafka is a distributed event store and streaming platform initially developed by LinkedIn and now widely used by companies like Netflix and Uber for data pipelines. It is favored for its reliability and scalability. Kafka messages are written in batches to enhance efficiency, and these messages are categorized into topics and partitions. Producers send messages to Kafka brokers, while consumers read these messages. Kafka brokers usually function within a cluster, allowing for message replication and redundancy. Despite its benefits, Kafka has several complexities, including a plethora of configuration options and underdeveloped client libraries outside Java and C.
184
2
8
Article
Medium·2y
How Did LinkedIn Handle 7 Trillion Messages Daily With Apache Kafka?
LinkedIn uses Apache Kafka to manage and process up to 7 trillion messages daily. They achieve reliability and scalability through a multi-tiered Kafka deployment across multiple data centers, leveraging local and aggregate clusters. LinkedIn ensures message completeness with an internal auditing tool that tracks sent and consumed messages. They maintain a close relationship with the open-source Kafka community by regularly integrating features and patches from their internal branches into the upstream Kafka branch.
175
4
9
Video
YouTube·2y
When to Use Kafka or RabbitMQ | System Design
Kafka and RabbitMQ serve different purposes in distributed systems. Kafka is designed for high-throughput stream processing, fanning out messages to multiple consumers, and handling uniform, short processing tasks. RabbitMQ is a traditional message queue system better suited for complex routing, long-running tasks, and handling sporadic data flow with acknowledgments for message processing. Choose Kafka for scenarios requiring high-speed, real-time data distribution and RabbitMQ for more controlled message queuing and processing.
102
1
10
Article
System Design Codex·2y
3 Kafka Messaging Strategies
Exploring three Kafka messaging strategies: Fire and Forget, Synchronous Send, and Asynchronous Send. Discusses how Kafka Producers work, the trade-offs between each strategy, and recommendations for when to use each approach.
85
1
11
Article
Community Picks·2y
Building Microservices Architecture with CQRS Pattern Using Kafka and NestJS: A Step-by-Step Guide
This guide explains how to build a sample blog application using microservices architecture with CQRS pattern, Kafka, and NestJS. It covers setting up the project structure, configuring Kafka with Docker Compose, and developing microservices for user authentication, blog article creation, and event sourcing for reliable communication between services.
63
1
12
Article
ByteByteGo·2y
Cloudflare’s Trillion-Message Kafka Infrastructure: A Deep Dive
Cloudflare's Kafka infrastructure has processed 1 trillion messages, demonstrating significant scaling and resilience. The engineering team faced challenges with coupling, unstructured communication, and common usage patterns. Lessons learned include the importance of balance between configuration and simplicity, visibility in distributed systems, well-defined contracts, and knowledge sharing.
62
13
Video
YouTube·1y
Apache Kafka in 15 minutes
Apache Kafka, developed by LinkedIn in 2011, is an open-source messaging system known for its scalability and reliability. Kafka is primarily used for event streaming and message queuing, making it ideal for systems requiring high throughput and fault tolerance. It includes key components like producers, brokers, and consumers, and ensures data consistency with features like partitions, replicas, and the at-least-once delivery guarantee. Kafka also optimizes performance with techniques like zero-copying which enhance message throughput significantly.
60
1
14
Video
ByteByteGo·1y
Apache Kafka Fundamentals You Should Know
Apache Kafka is a distributed event store and real-time streaming platform, originally developed at LinkedIn, that powers many large data pipelines. Kafka organizes data into messages, which are further categorized into topics and divided into partitions for scalability. It efficiently handles multiple producers and consumer groups, allowing for high-throughput, fault tolerance, and scalable data processing. Kafka's retention policies ensure data persistence, and its applications include log aggregation, real-time event streaming, change data capture, and system monitoring across various industries.
60
15
Article
Dev Genius·2y
Kafka Demo Project: Part 1. Producers
This post introduces a step-by-step guide to a Kafka demo project using Java and Spring Boot. It covers setting up a Kafka cluster on Confluent Cloud, configuring the Confluent Schema Registry, and creating a Producer application to send messages to Kafka topics. Essential configurations for the 'application.properties' and 'application.yml' files, as well as Maven dependencies, are also discussed. The post concludes with an example of sending a POST request to the producer and exploring cluster configurations using Kafka's Admin API.
59
16
Article
ByteByteGo·2y
How PayPal Scaled Kafka to 1.3 Trillion Daily Messages
PayPal scaled Kafka to handle an enormous volume of 1.3 trillion messages per day. They use Kafka for various use cases, such as tracking, database synchronization, and risk detection. PayPal implemented improvements in cluster management to reduce operational overhead.
57
1
17
Article
System Design Codex·2y
Kafka Load Balancing at Agoda for Terabytes of Data
Agoda uses Kafka to manage hundreds of terabytes of data across various supply systems, including hotels and restaurants, ensuring real-time price updates. They faced challenges with the traditional round-robin partitioning and consumer assignment due to heterogeneous hardware and uneven workloads, resulting in over-provisioning. Agoda addressed these issues by implementing dynamic, lag-aware strategies, including a lag-aware producer and consumer, to optimize message distribution and reduce latency.
56
1
18
Article
DEV·2y
Introducing AutoMQ: a cloud-native replacement of Apache Kafka
AutoMQ is a cloud-native replacement for Apache Kafka, designed to address the evolving needs of modern data architectures with a focus on efficiency, scalability, and cost-effectiveness. Originating from a team of open-source pioneers, it offers a unique architecture that decouples storage and computation, leveraging cloud storage to provide significant cost savings and operational efficiency. AutoMQ maintains full compatibility with Kafka, supports multi-cloud environments, and aims to integrate stream data into data lakes to enhance data access and break down silos. The growing community and successful funding highlight its potential impact on the stream storage industry.
55
2
19
Article
Community Picks·2y
Apache Kafka — Important Designs. Filesystem, Zero-copy, and Batching
Apache Kafka leverages the OS filesystem for data storage, using the page cache to improve performance without adding memory overhead from Java objects. Kafka employs sequential access patterns to optimize read/write operations, benefiting from zero-copy optimization by reducing context switches and data transfers between user and kernel spaces. Batching messages enhances network and disk operation efficiency, ensuring high performance.
46
20
Article
ByteByteGo·1y
How LinkedIn Customizes Its 7 Trillion Message Kafka Ecosystem
LinkedIn utilizes Apache Kafka to handle over 7 trillion messages daily, managing this massive scale with over 100 Kafka clusters and more than 4,000 servers. Its Kafka infrastructure includes custom features and enhancements for scalability and operability, tailored through specialized LinkedIn Kafka release branches. LinkedIn maintains unique patches and contributions to the open-source project, ensuring optimal performance and resource utilization for their specific needs, while also sharing improvements with the community.
43
1
21
Article
Community Picks·2y
Event driven Microservices using Kafka and Rust
Learn how to build event-driven microservices using Kafka and Rust. Discover the advantages of event-driven architecture and how to handle challenges like out-of-order messages.
43
6
22
Video
codeHeim·2y
#44 Golang - Mastering Kafka with Golang: A Beginner's Guide
Learn how to use Apache Kafka with Golang by building a coffee order and brewing system. The guide covers setting up a Kafka producer to send coffee orders and a consumer to process these orders. It uses the Sarama library for Kafka integration, demonstrating how to handle HTTP requests, serialize data to JSON, and manage Kafka messages in a Golang application.
42
23
Article
ITNEXT·2y
The streaming bridges — A Kafka, RabbitMQ, MQTT, and CoAP example
This post provides an in-depth overview of various data streaming protocols including Kafka, RabbitMQ, MQTT, and CoAP, detailing their history, implementation, and use cases. The discussion highlights the differences between push and pull mechanisms, particularly in the context of Kafka and RabbitMQ. Detailed scenarios and examples, including IoT applications, are used to illustrate the practicality and utility of these protocols. Additionally, a practical example using Docker, Apache Spark, and various other tools is provided to demonstrate a comprehensive streaming architecture.
40
24
Article
Architecture Weekly·1y
Deduplication in Distributed Systems: Myths, Realities, and Practical Solutions
Duplication in distributed systems is a common issue due to retries, processing failures, and fault tolerance mechanisms. Deduplication aims to identify and eliminate duplicate messages, but it comes with challenges that impact scalability, performance, and reliability. The post explores how deduplication is implemented in technologies like Kafka and RabbitMQ, and discusses the trade-offs and complexities involved. It also highlights the concept of exactly-once processing as a more realistic goal than exactly-once delivery, emphasizing patterns like idempotency and transactional outboxes to achieve robust message handling.
39
25
Article
Cerbos·2y
How to pick the right inter-service communication pattern for your microservices
Efficient inter-service communication is essential for a successful microservices architecture. Different communication patterns, such as synchronous, asynchronous, and event-driven, offer various benefits and challenges. Strategies like retries, circuit breakers, timeouts, and bulkheads can enhance fault tolerance and resilience. Spotify's adoption of Apache Kafka for event-driven communication illustrates a scalable and decoupled microservices environment, supporting independent service evolution and robust failure management.
36
12

See all Kafka archives