Best of Big DataMarch 2026

  1. 1
    Article
    Avatar of duckdbDuckDB·11w

    Announcing DuckDB 1.5.0

    DuckDB 1.5.0 ("Variegata") is now available with a major CLI overhaul featuring color schemes, dynamic prompts, a pager, and last-result access via `_`. Key new features include native VARIANT type support (binary semi-structured data inspired by Snowflake/Parquet), a `read_duckdb` table function with glob support, Azure Blob/ADLSv2 write support, and an ODBC scanner extension. The GEOMETRY type moves into DuckDB core, enabling cross-extension geospatial interoperability, WKB storage, shredding compression (~3x size reduction), geometry statistics for query optimization, and CRS type-system support. An experimental PEG parser ships for better error messages and tab-completion suggestions. Lakehouse updates cover DuckLake spec v0.4, Delta Lake Unity Catalog write improvements, and Iceberg table properties. The httpfs backend switches from httplib to curl. Non-blocking checkpointing improves concurrent RW throughput by 17%. DuckDB 2.0 is planned for summer 2026. The v1.4 LTS line continues until September 2026.

  2. 2
    Article
    Avatar of snowflake_commSnowflake Community·12w

    Why Apache Iceberg Is Not “Just Another Table Format”

    Apache Iceberg is a table format specification that sits between file formats (Parquet, ORC) and compute engines, solving critical failures of the Hive era: slow query planning via directory listing, silent schema corruption from position-based column tracking, and lack of safe concurrent writes. Iceberg replaces directory-based tracking with a versioned metadata tree, enabling file-level statistics for query pruning, ACID transactions via immutable snapshots with atomic commits, and metadata-only schema evolution using permanent column IDs. Unlike a traditional warehouse, Iceberg delivers warehouse-grade guarantees on open, decoupled object storage — but shifts operational responsibility (compaction, snapshot expiration) to the user. Major adoption signals include Snowflake achieving full parity with native tables, AWS S3 Tables with built-in Iceberg support, and companies like LinkedIn and Airbnb reporting significant performance and cost gains.

  3. 3
    Article
    Avatar of duckdbDuckDB·11w

    Big Data on the Cheapest MacBook

    DuckDB's team benchmarked the entry-level MacBook Neo (Apple A18 Pro, 8 GB RAM, 512 GB SSD, $700) against AWS cloud instances using ClickBench and TPC-DS workloads. In cold runs, the MacBook outperformed cloud instances due to its local NVMe SSD vs. network-attached storage. In hot runs, the large c8g.metal-48xl cloud instance dominated, but the MacBook held its own against a mid-sized c6a.4xlarge. TPC-DS at SF100 completed in 15.5 minutes; at SF300, DuckDB spilled up to 80 GB to disk and finished all queries in 79 minutes. The verdict: the MacBook Neo is not ideal for daily heavy data workloads due to slower disk I/O and limited RAM, but it handles occasional local analytics well, especially when used primarily as a cloud client.