Best of Distributed Systems — 2024

1
Article
gitconnected·2y
Message Queues in System Design
Message queues are durable components that support asynchronous communication, helping to decouple events and handle tasks without immediate processing. This allows better scalability and durability, especially under high traffic. Different types of queues like FIFO and priority queues, as well as different models like push-based and pull-based queues, provide versatile solutions for various needs. Examples of message queues include RabbitMQ for versatility, Kafka for high throughput, and Amazon SQS for managed cloud-based services.
1.3K
29
2
Article
System Design Codex·2y
8 Strategies for Reducing Latency
High latency can render an application unusable, frustrating users and negatively impacting business outcomes. Developers need to understand low-latency strategies such as caching, using Content Delivery Networks (CDNs), load balancing, asynchronous processing, database indexing, data compression, pre-caching, and utilizing keep-alive connections to mitigate these issues and improve performance.
805
21
3
Article
Javarevisited·2y
How to Design Twitter (X) in a System Design Interview?
Designing a system like Twitter (X) in a system design interview involves outlining core functionalities such as composing and sharing tweets, following users, and favoriting tweets. Non-functional requirements like scalability, high availability, and stability are crucial for handling large-scale operations. Key aspects include capacity estimation, API design, database design, and understanding queries per second (QPS). Employing a structured approach and utilizing tools like Redis for caching, MySQL for data consistency, and Amazon S3 for media storage are essential. Detailed component design includes load balancers, CDNs, and handling failure scenarios to ensure robust system performance.
613
10
4
Article
Medium·2y
40 Must-Read White Papers to Learn System Design and Software Architecture
This post lists 40 essential white papers for learning system design and software architecture. It is geared towards those preparing for system design interviews or aiming to understand complex system architectures. Each white paper provides in-depth technical insights from industry leaders like Google and AWS, covering topics from distributed file systems to data processing models and consensus algorithms.
518
7
5
Article
Javarevisited·2y
Most-Used Distributed System Design Patterns
Distributed system design patterns offer architectural solutions and best practices for developing distributed applications. This post discusses widely-used patterns like Ambassador for proxy tasks, Circuit Breaker to prevent cascading failures, CQRS for separating read and write databases, Event Sourcing for recording events, Sidecar for managing cross-cutting concerns, Leader Selection for electing a single node leader, Publisher/Subscriber for asynchronous communication, Sharding for data distribution, Bulkhead to isolate system components, and Cache-Aside for optimized caching strategies. Examples of tools and implementations for each pattern are provided to illustrate their applications and benefits.
444
5
6
Article
Medium·2y
System Design: Load Balancer
Load balancers are essential in distributing workloads effectively across multiple servers in distributed applications. They can operate at various application layers and employ static or dynamic algorithms to manage requests. Static algorithms depend on predefined parameters while dynamic ones use real-time system state data. Popular load balancing strategies include Round Robin (and its variations), Least Connections, Least Response Time, IP Hashing, and URL Hashing. The choice of strategy depends on specific system needs and configurations to ensure optimal performance.
311
2
7
Article
Javarevisited·2y
System Design Basics — Rate Limiter
A rate limiter is a mechanism used in software systems and network communications to control the rate at which requests or operations are performed. It helps maintain system stability, prevent resource overuse, and ensure fair usage among users. Rate limiters are critical in high-traffic, distributed architectures. Common rate limiting algorithms include Token Bucket, Leaky Bucket, and Sliding Window. Understanding rate limiting is important for system design interviews, where it is often discussed alongside concepts like API gateways and load balancers.
293
2
8
Article
System Design Codex·2y
Message Queues & Message Brokers
Message queues enable asynchronous communication between producers and consumers by storing messages in FIFO order. They are useful for processing background tasks, distributing tasks, email services, buffering, and payment retries. Message brokers manage these queues and provide additional features like message routing, transformation, protocol translation, and support for the publish-subscribe pattern. This allows seamless integration and communication between different services in applications, such as in an e-commerce platform where order, inventory, shipping, and notification services interact efficiently.
242
4
9
Article
Community Picks·2y
9 Software Architecture Patterns for Distributed Systems
In modern software development, distributed systems require efficient design to manage data and communication between components. Key architectural patterns like Peer-to-Peer, API Gateway, Pub-Sub, Request-Response, Event Sourcing, ETL, Batching, Streaming Processing, and Orchestration offer solutions for reliability, scalability, and maintainability. These patterns are essential not only for system robustness but also for system design interviews, providing a deep understanding of their strengths and trade-offs.
229
10
Article
Hacker News·2y
taubyte/tau: Open source distributed Platform as a Service (PaaS). A self-hosted Vercel / Netlify / Cloudflare alternative.
Tau is an open-source, distributed Platform as a Service (PaaS) designed to compete with major providers like Vercel, Netlify, and Cloudflare. It's a developer-friendly framework focused on minimal configuration, auto-discovery, and peer-to-peer networking. Using Git for infrastructure management, Tau emphasizes local development and seamless production deployment. Features include WebAssembly support, content-addressed storage, and a plugin system for extensibility.
222
2
11
Article
Community Picks·2y
Things I Wished More Developers Knew About Databases
Effective database management involves understanding numerous critical concepts such as ACID properties, network reliability, transaction isolation levels, optimistic locking, and the impact of auto-incrementing IDs. Appreciating the complexity of database design helps in predicting potential issues like dirty reads, data loss, and write skews. Additionally, proper handling of latency, sharding, and clock skews can prevent operational surprises. It's essential to evaluate performance requirements per transaction, avoid nested transactions, and understand query planners for optimized database performance. Navigating online migrations smoothly ensures minimal downtime and accurate data migration.
194
4
12
Article
ByteByteGo·2y
EP126: The Ultimate Kafka 101 You Cannot Miss
This edition of the ByteByteGo newsletter covers several key topics, including a guide to understanding Apache Kafka, tips for efficient API design, an overview of AWS Services, and an advertisement for QA Wolf, an automated testing solution. Kafka is detailed with its core concepts like messages, topics, partitions, producers, consumers, clusters, and use cases. The AWS Services cheat sheet simplifies the exploration of AWS's expansive offerings. Additionally, the newsletter includes 8 practical tips for better API design.
193
13
Article
System Design Codex·2y
3 Interview Questions on Event-Driven Patterns
System Design interviews often test candidates on event-driven patterns such as Competing Consumer, Retry Messages, and Async Request-Response. The Competing Consumer Pattern allows multiple instances to process messages concurrently. The Retry Messages Pattern manages transient errors by retrying failed transactions with mechanisms like exponential backoff. The Async Request-Response Pattern involves using correlation IDs to relate requests and responses across multiple instances, ensuring asynchronous communication between services.
182
1
14
Article
Medium·2y
How Did LinkedIn Handle 7 Trillion Messages Daily With Apache Kafka?
LinkedIn uses Apache Kafka to manage and process up to 7 trillion messages daily. They achieve reliability and scalability through a multi-tiered Kafka deployment across multiple data centers, leveraging local and aggregate clusters. LinkedIn ensures message completeness with an internal auditing tool that tracks sent and consumed messages. They maintain a close relationship with the open-source Kafka community by regularly integrating features and patches from their internal branches into the upstream Kafka branch.
175
4
15
Article
Tinybird·2y
How to choose the right type of database
Understanding the different types of databases, factors to consider when choosing a database, and the implications of the CAP theorem on database selection.
159
12
16
Article
Tech World With Milan·2y
What are the main Cloud Design Patterns?
Cloud Design Patterns offer practical solutions to address the common fallacies associated with distributed computing, such as network reliability, latency, and security concerns. These patterns are essential for building dependable, scalable, and secure cloud systems. Key groups of patterns include data management, design and implementation, messaging, security, and reliability. Learn about implementing these patterns to improve system performance and resilience, as well as the various load-balancing options available in Azure for enhanced system availability and performance.
134
1
17
Article
Community Picks·2y
How Does Facebook Manage to Serve Billions of Users Daily?
Understanding how Facebook manages to serve billions of users daily involves exploring their use of caching systems, particularly Memcache. Cache stores data to anticipate future requests, enabling quicker data retrieval compared to databases. Facebook's Memcache optimizes performance through techniques like parallel requests with DAG, batching requests, and leasing to prevent stale data and manage heavy loads. These strategies allow efficient handling of massive user requests while maintaining data integrity.
133
5
18
Article
Community Picks·2y
gRPC
gRPC is a high-performance, open-source RPC framework that connects services across data centers and to backend services from devices, mobile apps, and browsers. It uses Protocol Buffers for service definitions, supports quick scaling, works across various languages and platforms, and offers bi-directional streaming with integrated authentication.
120
1
19
Article
ByteByteGo·1y
How Tinder Recommends To 75 Million Users with Geosharding
Tinder has improved its recommendation engine for over 75 million users by implementing geosharding, where user data is divided into geographically bound shards. This approach enhances performance, reduces latency, and improves scalability. The system leverages tools like Google's S2 Library and Apache Kafka, and addresses consistency challenges and traffic imbalances by using smart load balancing and dynamic adjustments. As a result, Tinder can manage 20 times more computations efficiently while maintaining low latency.
116
20
Article
Community Picks·2y
Microservices Architecture, The Hard Parts : Trap of Distributed Monolith
Seasoned Senior Software Engineers often encounter significant challenges when implementing Microservices Architecture. Initial enthusiasm can give way to difficulties, particularly when releasing new features or managing performance and latency due to service interdependencies. Identifying and addressing issues such as inadequate service boundaries, excessive synchronous communication, overly fine-grained services, service coupling, and shared code without versioning are critical to preventing the creation of a Distributed Monolith.
112
21
Article
freeCodeCamp·2y
How to Build Resilient Microservice Systems – SOLID Principles for Microservices
Learn about the SOLID principles and best practices for building efficient microservices.
103
2
22
Video
YouTube·2y
When to Use Kafka or RabbitMQ | System Design
Kafka and RabbitMQ serve different purposes in distributed systems. Kafka is designed for high-throughput stream processing, fanning out messages to multiple consumers, and handling uniform, short processing tasks. RabbitMQ is a traditional message queue system better suited for complex routing, long-running tasks, and handling sporadic data flow with acknowledgments for message processing. Choose Kafka for scenarios requiring high-speed, real-time data distribution and RabbitMQ for more controlled message queuing and processing.
102
1
23
Article
DEV·2y
How to implement a Distributed Lock using Redis
Running multiple instances of an application can create issues with concurrent database writes, potentially leading to inconsistent states. Distributed locking, particularly using Redis, provides a solution by ensuring only one instance can perform critical operations at a time. The Redlock algorithm is an effective method for implementing distributed locks across multiple Redis instances, ensuring consistency even if some instances fail.
98
1
24
Video
Community Picks·2y
7 Must-know Strategies to Scale Your Database
Understanding when and why to scale your database is essential to maintain optimal performance as your application grows. Key strategies include indexing for quick data retrieval, using materialized views for pre-computed snapshots of data, and implementing denormalization to simplify complex queries. Vertical scaling, adding resources to a single server, and caching frequently accessed data in a fast storage layer can enhance responsiveness. Replication bolsters availability and fault tolerance by creating database copies on multiple servers. Sharding, which involves splitting a database into smaller sections, enables horizontal scaling and manages large data loads efficiently.
89
25
Article
Hacker News·2y
exo-explore/exo: Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
Run an AI cluster at home using exo, a software that unifies everyday devices into a powerful GPU. It supports LLaMA and other popular models, and uses a peer-to-peer connection without a master-worker architecture. Install it from source with Python>=3.12.0 and access models via a ChatGPT-compatible API endpoint.
87
5

See all Distributed Systems archives