This post provides a reference implementation for Write-Audit-Publish (WAP) patterns on a data lake using Apache Iceberg and Project Nessie, all running in Python without the need for a JVM. It discusses the concept of WAP, the architecture and workflow of the implementation, and the advantages of using tables, interoperability, and branching in the implementation. The post also mentions the possibility of moving to a full Lakehouse architecture and concludes by highlighting the simplicity and flexibility of the Python-based developer experience.

8m read timeFrom towardsdatascience.com
Post cover image
Table of contents
Write-Audit-Publish for Data Lakes in Pure Python (no JVM)IntroductionWhat on earth is WAP?WAP on a data lake in PythonArchitecture and workflowVisualizeFrom the lake to the LakehouseConclusions

Sort: