Best of Distributed SystemsDecember 2024

  1. 1
    Article
    Avatar of bytebytegoByteByteGo·1y

    How Tinder Recommends To 75 Million Users with Geosharding

    Tinder has improved its recommendation engine for over 75 million users by implementing geosharding, where user data is divided into geographically bound shards. This approach enhances performance, reduces latency, and improves scalability. The system leverages tools like Google's S2 Library and Apache Kafka, and addresses consistency challenges and traffic imbalances by using smart load balancing and dynamic adjustments. As a result, Tinder can manage 20 times more computations efficiently while maintaining low latency.

  2. 2
    Article
    Avatar of systemdesigncodexSystem Design Codex·1y

    Must-Known Resiliency Patterns for Distributed Systems

    Distributed systems offer scalability and high availability but come with complexity and risks. Ensuring resiliency is crucial and involves employing downstream and upstream strategies. Downstream patterns include timeouts, circuit breakers, and retries with exponential backoff to handle service failures gracefully. Upstream patterns like load shedding, rate limiting, bulkheads, and health checks with load balancers protect services from overload and ensure stability. Implementing these patterns can significantly enhance the robustness and reliability of distributed systems.

  3. 3
    Article
    Avatar of hnHacker News·1y

    Thinking in Actors - Part 1

    This post explores the benefits of the Actor Model for managing state in software systems. It highlights the drawbacks of traditional approaches, such as anemic data models and misaligned business logic, and advocates for a richer domain-driven approach. Additionally, it discusses how virtual actors, as implemented in frameworks like Microsoft Orleans, can address challenges of concurrency, scalability, and fault tolerance in distributed systems.

  4. 4
    Article
    Avatar of mercariMercari Engineering·1y

    The Race Condition in multiple DB transactions and the solutions

    Race conditions can occur when using multiple database transactions in a single API request, especially in systems with high concurrency. This post outlines the challenges faced by the Merpay Balance team, who encountered race conditions while processing debt repayments. They evaluated solutions such as rollback, lock mechanisms, and merging transactions, ultimately choosing a lock mechanism to ensure only one request processes repayments at a time. The implementation details, including challenges and considerations, are shared for developers facing similar issues.

  5. 5
    Article
    Avatar of communityCommunity Picks·1y

    How concurrecy works: A visual guide

    Concurrent programming is complex, but visualizing it can make it easier to understand. Breaking down large concurrent systems into smaller models helps to grasp the state space and transitions. The post explores sequential and concurrent program visualization and emphasizes the importance of model checking to ensure program correctness. Using tools like SPIN and understanding state spaces and properties, you can create more reliable and robust programs.

  6. 6
    Video
    Avatar of youtubeYouTube·1y

    Apache Kafka in 15 minutes

    Apache Kafka, developed by LinkedIn in 2011, is an open-source messaging system known for its scalability and reliability. Kafka is primarily used for event streaming and message queuing, making it ideal for systems requiring high throughput and fault tolerance. It includes key components like producers, brokers, and consumers, and ensures data consistency with features like partitions, replicas, and the at-least-once delivery guarantee. Kafka also optimizes performance with techniques like zero-copying which enhance message throughput significantly.

  7. 7
    Video
    Avatar of bytebytegoByteByteGo·1y

    Apache Kafka Fundamentals You Should Know

    Apache Kafka is a distributed event store and real-time streaming platform, originally developed at LinkedIn, that powers many large data pipelines. Kafka organizes data into messages, which are further categorized into topics and divided into partitions for scalability. It efficiently handles multiple producers and consumer groups, allowing for high-throughput, fault tolerance, and scalable data processing. Kafka's retention policies ensure data persistence, and its applications include log aggregation, real-time event streaming, change data capture, and system monitoring across various industries.

  8. 8
    Article
    Avatar of detlifeData Engineer Things·1y

    Apache Flink Overview

    Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. It excels in real-time processing with a model centered on streams, using components such as Dispatcher, JobManager, ResourceManager, and TaskManager. Flink differentiates between event-time and processing-time semantics to manage complexities in data flows. It also offers robust state management and checkpointing to ensure fault tolerance. Its architecture supports scalable, high-throughput, and low-latency processing environments, making it suitable for applications involving complex event data.

  9. 9
    Article
    Avatar of baeldungBaeldung·1y

    Introduction to Apache Accumulo

    Apache Accumulo is a powerful, distributed key-value store designed for handling massive datasets with fine-grained security. Developed originally by the NSA and based on Google's Bigtable, it excels in scalability, performance, and security, enabling efficient data ingestion, retrieval, and processing. Accumulo supports cell-level security, server-side programming, and flexible data models, making it ideal for applications requiring strict access controls and large-scale data management.

  10. 10
    Article
    Avatar of platformaticPlatformatic·1y

    Reimagining caching invalidation for a faster & more scalable Node.js app

    Caching is crucial for improving application performance, especially during high-pressure events like Black Friday sales. However, cache invalidation in distributed microservices environments can be challenging, leading to issues like serving stale data. Platformatic offers a solution that simplifies caching through client-side HTTP standards and automated synchronization, ensuring that cached data remains consistent and up-to-date across all instances. This solution aims to minimize complexity and improve scalability and reliability.

  11. 11
    Article
    Avatar of lobstersLobsters·1y

    Building a distributed log using S3 (under 150 lines of Go)

    The post describes how to implement a durable, distributed, and highly available log using AWS S3 in less than 150 lines of Go. Key highlights include the structure of the log interface, the implementation of the Append and Read operations, handling concurrent writes with S3 conditional writes, and failover/crash recovery mechanisms. The open-source project includes code and tests, with several open issues for further improvements.

  12. 12
    Article
    Avatar of cerbosCerbos·1y

    CRDTs and collaborative playground

    Cerbos simplifies authorization logic for developers with tools like their collaborative IDE, the Playground. The Playground leverages CRDTs (Conflict-Free Replicated Data Types) for real-time collaboration, ensuring data synchronization without complex conflict resolution. Tools like Yjs and Automerge support this functionality, allowing developers to build and test access control systems efficiently. The architecture involves Node.js backend for communication and IndexedDB for browser storage, ensuring robust and scalable collaborative experiences.

  13. 13
    Article
    Avatar of francofernandoThe Polymathic Engineer·1y

    Year-end wrap

    The Polymathic Engineer newsletter reflects on 2024, highlighting growth and consistency. The author shares personal end-year reflections, a recap of the newsletter's evolution, favorite articles on algorithms, system design, and software engineering, and thoughts on social media use. Despite a busy year, the newsletter doubled its subscribers and increased paid readership. The author plans to write more high-quality articles and use Bluesky more frequently.