Hardwood 1.0.0.Beta2, a Java-based Apache Parquet parser, has been released with several notable additions. Key features include support for Parquet's VARIANT logical type (semi-structured, JSON-like data) via a new PqVariant API, including transparent reassembly of shredded variants. A new interactive TUI command (`hardwood dive`) lets developers explore Parquet file structure, row groups, column chunks, indexes, and data interactively. The reader API has been unified with a builder pattern replacing overloaded methods. Performance improvements include a reworked page fetching/decoding pipeline with per-column parallelism, reducing a 9.6 GB NYC taxi benchmark from ~2.7s to 2.2s and a nested-file benchmark from ~1.4s to 0.7s. S3 reads are more efficient via request coalescing and local off-heap caching. Additional support for INTERVAL, MAP/LIST, and INT96 logical types is also included. A 1.0 GA release is expected in May.

6m read timeFrom morling.dev
Post cover image

Sort: