Best of Data ProcessingJanuary 2025

  1. 1
    Article
    Avatar of opensourcesquadOpen Source·1y

    Pyper - Concurrent Python Made Simple

    Pyper is a flexible, pure-Python framework designed for concurrent and parallel data processing. It features an intuitive API that unifies threaded, multiprocessed, and asynchronous work using functional programming principles. Pyper ensures safety by managing underlying task execution and resource clean-up, and it is optimized for efficiency with lazy execution through queues, workers, and generators.

  2. 2
    Article
    Avatar of collectionsCollections·1y

    Understanding Apache Kafka: Basics and Key Features

    Apache Kafka is a distributed event-streaming platform designed for real-time data processing. It manages data flow efficiently in event-driven systems with components like topics, partitions, producers, consumers, and brokers. Kafka ensures high availability through data replication and a leader-follower model. Its architecture supports data persistence and parallel processing via consumer groups. The recent introduction of Kafka Raft (KRaft) aims to simplify cluster management.

  3. 3
    Article
    Avatar of newstackThe New Stack·1y

    Duck DB: Query Processing Is King

    DuckDB is an in-process database that simplifies query processing without focusing on data persistence. It supports multiple programming languages and is efficient for testing scenarios and on-the-fly data transformations. DuckDB is especially useful for gaining SQL query support without the need for a full database system.