DuckLake introduces data inlining, a technique that stores small inserts, deletes, and updates directly in the catalog database instead of writing Parquet files to object storage. This eliminates the classic 'small files problem' that plagues streaming workloads in traditional data lakes like Iceberg. Benchmarks show 926× faster aggregation queries and 105× faster ingestion compared to Iceberg with Apache Polaris. Inlined data supports full time-travel via snapshot tracking, and can be flushed to consolidated Parquet files on demand via a checkpoint command. The feature ships with DuckLake v1.0 in April but is available now via nightly builds.

14m read timeFrom duckdb.org
Post cover image
Table of contents
Example: Streaming Sensor DataStreaming BenchmarkHow Inlining WorksConclusion

Sort: