Best of NetflixNovember 2024

  1. 1
    Article
    Avatar of detlifeData Engineer Things·1y

    I spent 4 hours learning how Netflix operates Apache Iceberg at scale.

    Netflix has developed a sophisticated data platform to handle extensive data pipelines and analytics, using Apache Iceberg to overcome the limitations of their previous Hive-based system. Key components include Polaris, a custom metastore for Iceberg, and Janitors, a cleanup service. They also implemented Autotune for optimizing data layout and Autolift for localizing data files. Moreover, secure access controls were established for Iceberg tables. Netflix's migration tool for transitioning from Hive to Iceberg minimizes data movement and business interruptions.

  2. 2
    Article
    Avatar of netflixNetflix TechBlog·1y

    Netflix’s Distributed Counter Abstraction

    Netflix's Distributed Counter Abstraction is a high-performance, scalable counting service built on top of their TimeSeries Abstraction. It supports two primary counting modes—Best-Effort and Eventually Consistent—to cater to different use cases and trade-offs involving accuracy, latency, and infrastructure costs. The service aims to handle high throughput and availability by leveraging a combination of caching, durable queuing, and periodic aggregation mechanisms. Additionally, it incorporates various approaches to manage potential data loss, idempotency, and contention issues inherent in distributed systems.

  3. 3
    Article
    Avatar of detlifeData Engineer Things·1y

    How does Netflix ensure the data quality for thousands of Apache Iceberg tables?

    Netflix employs the Write-Audit-Publish (WAP) pattern using Apache Iceberg to maintain high data quality across thousands of tables. The WAP pattern involves writing data to a hidden snapshot, auditing it, and publishing it only if it passes quality checks. This approach is analogous to CI/CD workflows, ensuring validated data is exposed to downstream consumers. Apache Iceberg's structure, including manifest files, metadata files, and catalog, supports efficient snapshot management and branching, similar to version control in Git.

  4. 4
    Article
    Avatar of newstackThe New Stack·1y

    Netflix Engineers Rethink Mock Testing for GraphQL

    Netflix engineers are reevaluating mock testing strategies for GraphQL to enhance production reliability. Creating effective mocks for its complex infrastructure poses significant challenges. Traditional UI testing lacks comprehensiveness for distributed environments, while canary releases and integration testing offer more reliability. An ideal testing solution should realistically model all traffic without disrupting development workflows. Netflix's new approach leverages its DGS framework for customizable and user-friendly mock testing, although it's still in development. Collaboration and understanding diverse team needs are key to success.