Best of DuckDBNovember 2024

  1. 1
    Article
    Avatar of frankelA Java geek·2y

    DuckDB in Action

    DuckDB in Action by Mark Needham, Michael Hunger, and Michael Simons offers a detailed guide to DuckDB with a step-by-step approach. The book covers DuckDB basics, advanced SQL queries, and its integration with ecosystems like Python’s Pandas and Apache Spark. Despite being informative, the book struggles with focus, fluctuating between teaching DuckDB and general SQL learning.

  2. 2
    Article
    Avatar of motherduckMotherDuck·2y

    The Data Warehouse powered by DuckDB SQL

    MotherDuck combines the power of DuckDB SQL with cloud services to offer a flexible and powerful data warehousing solution. It includes robust capabilities for data ingestion, transformation, and analysis, leveraging SQL and additional native Python APIs for complex tasks. Its built-in AI features enhance usability for business users, data scientists, and developers. MotherDuck supports a wide range of file formats and storage solutions, and offers advanced analytical functions, including Machine Learning algorithms, to solve complex business problems efficiently.

  3. 3
    Article
    Avatar of duckdbDuckDB·1y

    Analyzing Open Government Data with duckplyr

    duckplyr is a high-performance, drop-in replacement for dplyr in R, powered by DuckDB. This post demonstrates how to use duckplyr to clean and analyze an open data set from New Zealand's government, showcasing the library's capabilities for efficient data wrangling and analysis. With enhanced CSV parsing and holistic optimization, duckplyr ensures faster and more ergonomic handling of large datasets compared to dplyr.

  4. 4
    Article
    Avatar of duckdbDuckDB·1y

    DuckDB Tricks – Part 3

    This blog post delves into various advanced features and performance optimization techniques for DuckDB, particularly focusing on convenient methods for handling table operations and improving the processing speed of Parquet and CSV files. It includes practical examples using the Dutch railway services dataset, demonstrating column renaming with pattern matching, data loading with globbing, reordering Parquet files, and employing Hive partitioning to speed up queries significantly.