This post discusses the birth of Parquet, a columnar storage format for Hadoop, and its integration into the open source data stack. It covers the main tools used in the data platform at Twitter, the benefits of organizing data in a columnar layout, and the importance of project governance and open source foundations.

8m read time From sympathetic.ink
Post cover image
Table of contents
Chapter I: The birth of Parquetlessons learned

Sort: