Vortex is an open-source, high-performance columnar file format designed for data processing on object storage. It claims 100x faster random access, 10-20x faster scans, and 5x faster writes compared to Apache Parquet while maintaining similar compression ratios. The format features an extensible architecture with pluggable encodings, zero-copy Apache Arrow compatibility, and integrations with DataFusion, DuckDB, Spark, and Pandas. Now a Linux Foundation AI & Data incubation project, Vortex has stabilized its file format as of version 0.36.0 and is available in Rust, Python, and Java.
Sort: