Vortex is an open-source, high-performance columnar file format designed for data processing on object storage. It claims 100x faster random access, 10-20x faster scans, and 5x faster writes compared to Apache Parquet while maintaining similar compression ratios. The format features an extensible architecture with pluggable encodings, zero-copy Apache Arrow compatibility, and integrations with DataFusion, DuckDB, Spark, and Pandas. Now a Linux Foundation AI & Data incubation project, Vortex has stabilized its file format as of version 0.36.0 and is available in Rust, Python, and Java.

4m read timeFrom github.com
Post cover image
Table of contents
OverviewKey FeaturesQuick StartProject InformationAcknowledgments 🏆

Sort: