Best of Data Engineering — June 2024
- 1
- 2
Medium·2y
The Secret to Success in Large-Scale Data Engineering Projects
This post explores how Databricks Asset Bundles (DABs) can be used for workflow implementation and automation in Databricks. It highlights the advantages of DABs in managing large-scale data projects and provides a step-by-step guide for deploying DABs.
- 3
Substack·2yHow to choose between batch, micro-batch, and streaming when building a data pipeline
This post discusses the tradeoffs between batch, micro-batch, and streaming when building a data pipeline. It explores the downsides of streaming pipelines, the advantages of using a batch pipeline, and suggests technologies for batch processing.
- 4
Netflix TechBlog·2y
A Recap of the Data Engineering Open Forum at Netflix
The first Data Engineering Open Forum at Netflix gathered data engineers to discuss modern developments, challenges, and future prospects in the field. Highlights included talks on machine learning-powered auto remediation for Netflix's big data platform, employing generative AI for enterprise data modeling, managing real-time data delivery, building data platforms post-GDPR, unbundling data warehouses, evolving data quality strategies at Airbnb, and enhancing data productivity with SQLMesh.
- 5
Towards Data Science·2y
Data Engineering, Redefined
The post argues for a redefinition of data engineering, separating it from the implementation of business logic, which should remain the domain of application developers. It highlights how current practices create brittle and uncoordinated data pipelines and proposes focusing data engineering on the movement, manipulation, and management of data in a technical sense. A call is made for clearer separation between business logic and data manipulation to improve software quality and maintainability.
- 6
Towards Data Science·2y
Back to Basics: Databases, SQL, and Other Data-Processing Must-Reads
Relational databases and SQL queries remain vital for daily workflows of data professionals, despite the buzz around LLMs. This post highlights essential reads on maintaining and growing skills in data and ML tasks, emphasizing the interconnectedness of foundational data operations and advanced AI tasks. Featured topics include simplifying Python code for data engineering, learning SQL for data analytics, using pivot tables in SQL, managing Excel charts with VBA, and turning relational databases into graph databases.