This post discusses the birth of Parquet, a columnar storage format for Hadoop, and its integration into the open source data stack. It covers the main tools used in the data platform at Twitter, the benefits of organizing data in a columnar layout, and the importance of project governance and open source foundations.
•8m read time• From sympathetic.ink
Sort: