I started my data engineering journey back in 2019, and the architecture where we first land data in the data lake and then transform it into the data warehouse seemed like the obvious approach…

Data Engineer Things

The post explores the evolution of data architecture, beginning with traditional data warehouses, followed by the introduction of data lakes, and culminating in the emergence of the Lakehouse paradigm. It highlights the limitations of data warehouses and data lakes, such as challenges with unstructured data and data staleness. The Lakehouse architecture aims to combine the best features of both by utilizing low-cost storage and enhancing management features such as ACID transactions and query optimization. The post also mentions various technologies like Delta Lake, Apache Hudi, and Apache Iceberg that facilitate efficient data management in Lakehouse architectures.

The Data Lake, Warehouse and Lakehouse

I spent 5 hours understanding more about the Delta Lake table format

I spent 8 hours relearning the Delta Lake table format

I spent 5 hours exploring the story behind Apache Hudi.

I spent 4 hours learning Apache Iceberg. Here’s what I found.

I spent 7 hours diving deep into Apache Iceberg