Best of Distributed SystemsFebruary 2025

  1. 1
    Article
    Avatar of bytebytegoByteByteGo·1y

    How Netflix Built a Distributed Counter for Billions of User Interactions

    Netflix uses a Distributed Counter Abstraction to efficiently track billions of user interactions. This system addresses the need for low latency, high throughput, and cost efficiency by utilizing different counting techniques tailored to various use cases. The architecture employs a hybrid approach combining event logging, background aggregation, and caching. Key benefits include scalability, reliability, and balancing trade-offs between immediacy and consistency.

  2. 2
    Article
    Avatar of hnHacker News·1y

    deepseek-ai/3FS: A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

    The Fire-Flyer File System (3FS) is a high-performance distributed file system tailored for AI training and inference workloads. It utilizes modern SSDs and RDMA networks to offer a shared storage layer that simplifies the development of distributed applications. Key features include strong consistency, stateless metadata services, efficient data preparation, random access to training samples, high-throughput checkpointing, and cost-effective KVCache for inference. Performance tests show impressive read throughput and sorting efficiency, making 3FS an optimal choice for large-scale AI tasks.

  3. 3
    Article
    Avatar of systemdesigncodexSystem Design Codex·1y

    4 Challenges of Distributed Systems - And Possible Solutions

    Distributed systems are fundamental in powering modern applications, enabling scalability, fault tolerance, and enhanced performance. However, they pose challenges such as unreliable communication, coordination difficulties, scalability decisions, and resiliency to failures. Solutions include using TCP and TLS for secure communication, consensus algorithms for coordination, microservices and load balancers for scalability, and various techniques like timeouts and circuit breakers for resiliency.

  4. 4
    Article
    Avatar of bytebytegoByteByteGo·1y

    How Instagram Scaled Its Infrastructure To Support a Billion Users

    Instagram has evolved significantly since its launch in 2010, overcoming early scalability challenges and transforming to support over a billion users. Key strategies included migrating from AWS to Facebook’s data centers, implementing distributed databases, and optimizing caching mechanisms. The platform uses Django for its web framework, RabbitMQ for message brokering, and Celery for asynchronous task processing. Their backend architecture and deployment models incorporate continuous integration and automated monitoring, ensuring high performance and reliability.

  5. 5
    Article
    Avatar of awegoAwesome Go·1y

    Let's Implement Consistent Hashing From Scratch

    Consistent hashing is a key distribution technique used in distributed systems to efficiently map keys to servers, minimizing data movement when nodes are added or removed. This post explains the concept, its importance, and demonstrates an implementation in Golang using a ConsistentHashRing. The implementation covers hashing functions, adding and removing nodes, storing and retrieving data, and handling node removal.

  6. 6
    Video
    Avatar of anthonyggAnthony GG·1y

    How I Scaled My App with Distributed Actors & WASM

    The post discusses the development and scaling of the Market Monkey trading terminal using Golang for both the backend and frontend. The application operates as both a desktop and web app via WebAssembly. The backend utilizes distributed actors for service communication and TimescaleDB for time-series data storage. Optimizations for WebAssembly, handling of websocket communications, and future scripting capabilities are also highlighted.

  7. 7
    Article
    Avatar of milanjovanovicMilan Jovanović·1y

    Introduction to Dapr for .NET Developers

    Dapr (Distributed Application Runtime) simplifies building microservices for .NET developers by abstracting infrastructure complexity behind consistent APIs. It offers essential building blocks like service invocation, state management, pub/sub, and secrets management, and uses the sidecar pattern to handle cross-cutting concerns like security and observability. This allows developers to focus on business logic and avoid vendor lock-in. Dapr seamlessly integrates with .NET and ASP.NET Core, providing a flexible and production-ready solution for distributed systems.

  8. 8
    Article
    Avatar of bytebytegoByteByteGo·1y

    How Google Spanner Powers Trillions of Rows with 5 Nines Availability

    Google Spanner is a globally distributed, strongly consistent, and highly available database system developed by Google. It combines SQL-based queries with the scalability of NoSQL, ideal for modern high-demand applications. Key features include synchronous replication for data durability, the use of TrueTime for accurate global timestamps, and dynamic sharding for load balancing. Spanner leverages Paxos for managing replication across multiple zones and ensures availability and consistency even during failures. The TrueTime API, using atomic and GPS clocks, provides precise time synchronization crucial for maintaining global consistency.