DuckDB is an in-process analytical database optimized for OLAP workloads that can query local Parquet and Delta Lake files directly without loading them fully into memory. Unlike pandas, DuckDB uses columnar vectorized execution and predicate pushdown to read only the necessary row groups and columns, making it significantly
•5m read time• From bartwullems.blogspot.com
Table of contents
What is DuckDB?InstallationQuerying local Parquet filesOnly read what you needQuerying Delta Lake tablesUsing a persistent connection for Multiple QueriesMixing DuckDB with PandasWhen to reach for DuckDB?Wrapping upMore informationSort: