Best of Data ProcessingMay 2024

  1. 1
    Article
    Avatar of medium_jsMedium·2y

    My First Billion (of Rows) in DuckDB

    The post describes the author's experience with DuckDB, a database for processing large volumes of data locally. It covers the problem of processing logs of Brazilian electronic ballot boxes and the challenges involved. The post explains the features and advantages of DuckDB and provides a step-by-step implementation of data processing. It concludes with the author's evaluation of DuckDB's performance and usability.

  2. 2
    Article
    Avatar of medium_jsMedium·2y

    How We Solve Load Balancing Challenges in Apache Kafka

    This post discusses the challenges of load balancing in Apache Kafka and presents solutions, such as lag-aware producers and consumers, to address these challenges.

  3. 3
    Article
    Avatar of communityCommunity Picks·2y

    How To Set Up a Multi-Node Kafka Cluster using KCraft

    Learn how to set up a multi-node Kafka cluster using the KRaft consensus protocol. Configure nodes to be part of the cluster, observe topic partition assignments, and assign topics to specific brokers. Explore how to connect to the cluster, create and consume messages, and handle node unavailability. Finally, discover how to migrate topics between nodes in the cluster.

  4. 4
    Article
    Avatar of hnHacker News·2y

    Data Science at the Command Line, 2e

    A revised guide on using the command line for data science, providing tools and techniques to improve efficiency and productivity. Ideal for data scientists, analysts, engineers, and researchers.