DuckLake introduces data inlining, a technique that stores small inserts, deletes, and updates directly in the catalog database instead of writing Parquet files to object storage. This eliminates the classic 'small files problem' that plagues streaming workloads in traditional data lakes like Iceberg. Benchmarks show 926× faster aggregation queries and 105× faster ingestion compared to Iceberg with Apache Polaris. Inlined data supports full time-travel via snapshot tracking, and can be flushed to consolidated Parquet files on demand via a checkpoint command. The feature ships with DuckLake v1.0 in April but is available now via nightly builds.
Sort: