Best of Data Engineering — June 2025

1
Article
Data Engineer Things·47w
Building a Real-Time Flight Data Pipeline with Kafka, Spark, and Airflow
A comprehensive guide to building a real-time flight data pipeline using Kafka for streaming, Spark for processing, and Airflow for orchestration. The pipeline fetches live flight data from a custom API, streams it through Kafka to MongoDB for storage, then uses Airflow to schedule daily ETL jobs that extract landed flight information into PostgreSQL and generate CSV reports. The project includes Docker containerization, complete code examples, and demonstrates end-to-end data engineering practices from real-time ingestion to batch processing and reporting.
119
4
2
Article
dltHub·49w
Building Engine-Agnostic Data Stacks
Modern data teams often use multiple engines like Spark, DuckDB, and Snowflake, but struggle with data portability and code reusability across platforms. Apache Iceberg solves the storage problem by enabling safe data sharing between engines through ACID transactions and multi-engine coordination. Tools like Ibis complement this by providing engine-agnostic analytical code that runs on any supported backend without modification. Together, these technologies create truly portable data stacks where both data and business logic are decoupled from specific compute engines, reducing vendor lock-in and integration overhead.
16
3
Article
Programming Digest·47w
Which Data Architecture Should I Choose for My Workplace? — A Data Engineer’s Approach
A comprehensive guide comparing four major data architecture approaches: Data Warehouse, Data Lake, Data Lakehouse, and Data Mesh. The article explains when to use each approach, their advantages and challenges, and provides platform recommendations. It focuses on the Medallion Architecture with its Bronze, Silver, and Gold layers for modern data warehouse design, emphasizing the importance of requirement analysis and proper architectural selection based on data types, analytical needs, and organizational structure.
12
4
Article
Data Engineer Things·49w
Stream Kafka Topic to the Iceberg Tables with Zero-ETL
AutoMQ introduces Table Topic, an open-source feature that automatically converts Kafka topic messages to Iceberg tables without requiring separate ETL pipelines. The solution addresses the complexity of managing Kafka-to-lakehouse data flows by handling schema management, partitioning, and upsert operations automatically. This represents an evolution from Kafka's original shared-nothing architecture to a shared-data approach, where data is accessible through both Kafka APIs and as Iceberg tables for analytics workloads.
10

See all Data Engineering archives