Best of Data StreamingAugust 2024

  1. 1
    Article
    Avatar of quastorQuastor Daily·2y

    How Canva Collects 25 Billion Events Per Day

    Canva processes over 25 billion events daily using AWS Kinesis, benefiting from its real-time data analysis and cost-saving features. Their data pipeline involves event batching, compression, and enrichment before routing to Snowflake for further analysis. The switch from AWS SQS to Kinesis significantly reduced their costs by 85%.

  2. 2
    Article
    Avatar of devtoDEV·2y

    Introducing AutoMQ: a cloud-native replacement of Apache Kafka

    AutoMQ is a cloud-native replacement for Apache Kafka, designed to address the evolving needs of modern data architectures with a focus on efficiency, scalability, and cost-effectiveness. Originating from a team of open-source pioneers, it offers a unique architecture that decouples storage and computation, leveraging cloud storage to provide significant cost savings and operational efficiency. AutoMQ maintains full compatibility with Kafka, supports multi-cloud environments, and aims to integrate stream data into data lakes to enhance data access and break down silos. The growing community and successful funding highlight its potential impact on the stream storage industry.

  3. 3
    Article
    Avatar of newstackThe New Stack·2y

    Kafka 3.8 Brings Faster Startups to Java Developers

    Kafka 3.8, now packaged with GraalVM, promises faster startups and streamlined testing for Java developers. This update improves control over compression schemes, enhancing performance by up to 156%, and introduces support for tiered storage. The Consumer Rebalance Protocol has also been optimized to reduce computational overhead on consumers. Confluent, a major contributor, continues to support Kafka with enterprise and cloud-based services.

  4. 4
    Article
    Avatar of communityCommunity Picks·2y

    Append-only tables and incremental reads — Jack Vanlightly

    The post discusses the support for append-only tables and incremental reads in various table formats such as Apache Iceberg, Delta Lake, Apache Hudi, and Apache Paimon. It explains how incremental reads allow compute engines to return new records or changes since the last query. Each table format supports these features differently, with Iceberg and Delta adding new data files without performing data conflict checks, whereas Hudi uses file groups and Paimon uses row-level operations. The post also touches on the performance implications and potential data conflicts with multiple writers.