Best of DuckDB2024

  1. 1
    Article
    Avatar of hnHacker News·2y

    Postgres is eating the database world

    PostgreSQL is an advanced, open-source, and extensible data management framework that is capable of competing with Oracle and MySQL. It offers a wide range of analysis-related extensions and has the potential to become a unified, super-converged database. The emergence of PostgreSQL has shifted the paradigms in the database domain, and it is now considered a mainstream best practice. In the future, the focus will be on database extensions and services, and PostgreSQL ecosystem extensions are expected to lead the way towards a monopoly.

  2. 2
    Article
    Avatar of motherduckMotherDuck·2y

    What Happens When You Put a Database in Your Browser?

    WebAssembly has revolutionized browser capabilities, enabling high-performance applications like DuckDB directly in the browser. DuckDB Wasm is particularly useful for in-browser analytics, ad-hoc queries, and educational tools. A practical example includes a Firefox extension for displaying Parquet file schemas in GCP Cloud Storage. MotherDuck leverages DuckDB Wasm for responsive querying and offers an SDK for developers to create data-driven applications.

  3. 3
    Article
    Avatar of communityCommunity Picks·2y

    DuckDB vs Polars — Which One Is Faster?

    This post presents a benchmark comparing the performance of DuckDB and Polars, two data tools. Polars outperforms DuckDB in reading CSV files and executing window functions, while DuckDB is faster in join operations. The performance of both tools is similar overall.

  4. 4
    Article
    Avatar of frankelA Java geek·2y

    DuckDB in Action

    DuckDB in Action by Mark Needham, Michael Hunger, and Michael Simons offers a detailed guide to DuckDB with a step-by-step approach. The book covers DuckDB basics, advanced SQL queries, and its integration with ecosystems like Python’s Pandas and Apache Spark. Despite being informative, the book struggles with focus, fluctuating between teaching DuckDB and general SQL learning.

  5. 5
    Article
    Avatar of motherduckMotherDuck·2y

    Getting started with modern GIS using DuckDB

    DuckDB and MotherDuck simplify geospatial analysis by supporting various spatial formats and offering easy integration with Python libraries for visualization. The post guides you through building a heatmap of EV charging spots in France using DuckDB with a Python visualization tool called Lonboard. It also discusses leveraging cloud computing with MotherDuck for sharing and exporting geospatial data.

  6. 6
    Article
    Avatar of simplethreadSimple Thread·2y

    Migrating a small web application from SQL using DuckDB

    Greg Kontos reduced data storage costs by over 99% by migrating his hobby recipe tracking site from a GCP Cloud SQL instance to using DuckDB. Initially faced with a $68/month cost, with $67 of that for the SQL database, Greg explored various alternatives including serverless databases, dataframes, and local databases. He ultimately chose DuckDB for its SQL-like interface, simplicity, and low cost. Despite some minor issues, the transition was successful, lowering monthly costs to just $0.25 and maintaining functionality.

  7. 7
    Article
    Avatar of medium_jsMedium·2y

    My First Billion (of Rows) in DuckDB

    The post describes the author's experience with DuckDB, a database for processing large volumes of data locally. It covers the problem of processing logs of Brazilian electronic ballot boxes and the challenges involved. The post explains the features and advantages of DuckDB and provides a step-by-step implementation of data processing. It concludes with the author's evaluation of DuckDB's performance and usability.

  8. 8
    Article
    Avatar of motherduckMotherDuck·2y

    The Data Warehouse powered by DuckDB SQL

    MotherDuck combines the power of DuckDB SQL with cloud services to offer a flexible and powerful data warehousing solution. It includes robust capabilities for data ingestion, transformation, and analysis, leveraging SQL and additional native Python APIs for complex tasks. Its built-in AI features enhance usability for business users, data scientists, and developers. MotherDuck supports a wide range of file formats and storage solutions, and offers advanced analytical functions, including Machine Learning algorithms, to solve complex business problems efficiently.

  9. 9
    Article
    Avatar of communityCommunity Picks·2y

    DuckDB Meets Apache Arrow

    This post discusses the combination of DuckDB and Apache Arrow technologies in building an analytics platform. It highlights the features and benefits of DuckDB and Apache Arrow, as well as how they are used together. The post also mentions future steps such as pre-aggregations and data federation.

  10. 10
    Article
    Avatar of motherduckMotherDuck·2y

    Performant dbt pipelines with MotherDuck

    This post recaps learnings from the dbt+MotherDuck workshop and delves into building performant data pipelines using DuckDB and MotherDuck. Key steps include utilizing the read_blob() function, leveraging pre_hooks and variables in DuckDB, implementing incremental models with read_csv(), and handling data de-duplication using unnest() and arg_max(). These techniques aim to optimize data workflows and enhance data transformation and analysis efficiency.

  11. 11
    Article
    Avatar of communityCommunity Picks·2y

    How to directly access 150k+ Hugging Face Datasets with DuckDB and query using GPT-4o

    Learn how to access 150k+ Hugging Face Datasets with DuckDB and query them using GPT-4o. DuckDB is a fast analytical database that allows easy access to remote data sources. Hugging Face Datasets provide curated and standardized datasets for AI. WrenAI integrates with GPT-4o to allow users to query the datasets and get answers to business questions.

  12. 12
    Article
    Avatar of lobstersLobsters·2y

    DuckDB as the New jq

    DuckDB is a database project similar to SQLite that has native support for reading and parsing JSON data without extra dependencies. It provides a simpler and more familiar SQL syntax compared to the powerful but complex syntax of jq.

  13. 13
    Article
    Avatar of motherduckMotherDuck·2y

    5 Hidden gems in DuckDB 1.1

    DuckDB 1.1 introduces several new features including custom HTTP headers for API calls, the VARINT data type for optimized memory usage, Pyodide support for running DuckDB in the browser, improvements in query execution speed with ORDER BY and LIMIT clauses, and enhanced debugging capabilities through an HTML EXPLAIN output.

  14. 14
    Article
    Avatar of motherduckMotherDuck·2y

    Developing a RAG Knowledge Base with DuckDB

    Learn how to build an AI-powered knowledge base using DuckDB, and use it to answer end users' questions by running embedding and language models.

  15. 15
    Article
    Avatar of hnHacker News·2y

    incentius-foss/WhatTheDuck: WhatTheDuck is an open-source web application built on DuckDB. It allows users to upload CSV files, store them in tables, and perform SQL queries on the data.

    WhatTheDuck is an open-source web application built on DuckDB. It allows users to upload CSV files, store them in tables, and perform SQL queries on the data.

  16. 16
    Article
    Avatar of motherduckMotherDuck·2y

    DuckDB & Python : end-to-end data engineering project

    This post discusses an end-to-end data engineering project using DuckDB and Python. It explores the architecture, ingestion pipeline, and sinking data using DuckDB. The post also provides instructions on how to fetch data from PyPi using Python and DuckDB.

  17. 17
    Article
    Avatar of duckdbDuckDB·1y

    Analyzing Open Government Data with duckplyr

    duckplyr is a high-performance, drop-in replacement for dplyr in R, powered by DuckDB. This post demonstrates how to use duckplyr to clean and analyze an open data set from New Zealand's government, showcasing the library's capabilities for efficient data wrangling and analysis. With enhanced CSV parsing and holistic optimization, duckplyr ensures faster and more ergonomic handling of large datasets compared to dplyr.

  18. 18
    Article
    Avatar of motherduckMotherDuck·2y

    DuckDB & dbt | End-To-End Data Engineering Project

    This post explores how to improve the development experience in data engineering projects by using DuckDB and dbt. It covers topics such as streamlining architecture, accelerating pipelines, and writing unit tests. It also provides best practices for AWS S3 authentication and managing incremental pipelines.

  19. 19
    Article
    Avatar of duckdbDuckDB·1y

    DuckDB Tricks – Part 3

    This blog post delves into various advanced features and performance optimization techniques for DuckDB, particularly focusing on convenient methods for handling table operations and improving the processing speed of Parquet and CSV files. It includes practical examples using the Dutch railway services dataset, demonstrating column renaming with pattern matching, data loading with globbing, reordering Parquet files, and employing Hive partitioning to speed up queries significantly.

  20. 20
    Article
    Avatar of materializedviewMaterialized View·2y

    DuckDB Is Not a Data Warehouse

    DuckDB is a highly portable and fast tool for handling columnar data, often used by analytics and data engineers for various creative purposes. However, it is not considered a viable solution for large enterprise data warehousing due to its deployment model and limited scalability. MotherDuck aims to address these issues by building a centralized deployment model but faces tough competition from established cloud data warehouses like Snowflake and BigQuery, as well as PostgreSQL extensions.