DuckLake’s data inlining stores small updates directly in the catalog, eliminating the “small files problem” and making continuous streaming into data lakes practical. Our benchmark shows 926× faster queries and 105× faster ingestion when compared to Iceberg.

DuckDB

DuckLake introduces data inlining, a technique that stores small inserts, deletes, and updates directly in the catalog database instead of writing Parquet files to object storage. This eliminates the classic 'small files problem' that plagues streaming workloads in traditional data lakes like Iceberg. Benchmarks show 926× faster aggregation queries and 105× faster ingestion compared to Iceberg with Apache Polaris. Inlined data supports full time-travel via snapshot tracking, and can be flushed to consolidated Parquet files on demand via a checkpoint command. The feature ships with DuckLake v1.0 in April but is available now via nightly builds.

Data Inlining in DuckLake: Unlocking Streaming for Data Lakes