In my previous post, I provided a step-by-step guide on how to set up a system for your data engineering projects using technologies such as Spark and Apache Iceberg. This time, I’d like to take it a…

Data Engineer Things

Explore how to build an efficient data pipeline without using Spark by leveraging technologies like MinIO, Iceberg, Nessie, Polars, StarRocks, Mage, and Docker. The pipeline uses the medallion architecture with Bronze, Silver, and Gold layers to ensure data quality and integrity through the Write-Audit-Publish (WAP) pattern. The post provides a detailed guide to setting up the necessary components, executing data transformation and quality checks, and using branching strategies with Project Nessie to manage data versions. Integration with Slack for alert notifications and catalog setup for querying data using StarRocks are also discussed.

Data Pipeline Development with MinIO, Iceberg, Nessie, Polars, StarRocks, Mage, and Docker

Putting It All Together: Implementing the Data Pipeline