Best of DuckDBJune 2025

  1. 1
    Article
    Avatar of duckdbDuckDB·48w

    Faster Dashboards with Multi-Column Approximate Sorting

    Advanced multi-column sorting techniques using space filling curves (Morton and Hilbert encodings) and truncated timestamps can significantly improve query performance on columnar data formats. These methods enable approximate sorting across multiple columns simultaneously, allowing diverse dashboard queries to benefit from min-max indexes and row group pruning. Experiments on flight data show Hilbert encoding provides the most consistent performance across different query patterns, while sorting by truncated timestamps (year-level granularity) combined with Hilbert encoding works best for time-filtered queries.

  2. 2
    Article
    Avatar of hnHacker News·45w

    sirius-db/sirius

    Sirius is a GPU-native SQL engine that integrates with existing databases like DuckDB through the Substrait query format. It delivers approximately 10x performance improvements over CPU-based query engines on TPC-H benchmarks while maintaining the same hardware costs. The system supports NVIDIA GPUs with compute capability 7.0+ and CUDA 11.2+, offering deployment options through AWS AMIs, Docker images, or manual installation. Sirius handles common SQL operations including filtering, joins, aggregations, and ordering, though it currently has limitations around data size constraints, row count limits, and partial NULL column support.

  3. 3
    Article
    Avatar of dlthubdltHub·47w

    Building Engine-Agnostic Data Stacks

    Modern data teams often use multiple engines like Spark, DuckDB, and Snowflake, but struggle with data portability and code reusability across platforms. Apache Iceberg solves the storage problem by enabling safe data sharing between engines through ACID transactions and multi-engine coordination. Tools like Ibis complement this by providing engine-agnostic analytical code that runs on any supported backend without modification. Together, these technologies create truly portable data stacks where both data and business logic are decoupled from specific compute engines, reducing vendor lock-in and integration overhead.

  4. 4
    Article
    Avatar of duckdbDuckDB·45w

    Discovering DuckDB Use Cases via GitHub

    DuckDB team demonstrates how to discover and analyze DuckDB usage across GitHub repositories by querying the GitHub API with DuckDB itself. The approach involves using DuckDB's HTTP capabilities to fetch repository data, processing JSON responses with SQL, and automating the workflow with GitHub Actions to generate daily reports in Markdown format. The solution includes pagination handling, data filtering, and visualization of historical trends through Git commit analysis.