Best of Data EngineeringJune 2024

  1. 1
    Article
    Avatar of medium_jsMedium·2y

    Roadmap to Learn Data Engineering: How I Would Start Again

    A roadmap for learning data engineering, covering Python, SQL, command line, data warehouse, data modeling, data storage, data processing, data transformation, data orchestration, advanced topics, and staying updated.

  2. 2
    Article
    Avatar of medium_jsMedium·2y

    The Secret to Success in Large-Scale Data Engineering Projects

    This post explores how Databricks Asset Bundles (DABs) can be used for workflow implementation and automation in Databricks. It highlights the advantages of DABs in managing large-scale data projects and provides a step-by-step guide for deploying DABs.

  3. 3
    Article
    Avatar of substackSubstack·2y

    How to choose between batch, micro-batch, and streaming when building a data pipeline

    This post discusses the tradeoffs between batch, micro-batch, and streaming when building a data pipeline. It explores the downsides of streaming pipelines, the advantages of using a batch pipeline, and suggests technologies for batch processing.

  4. 4
    Article
    Avatar of netflixNetflix TechBlog·2y

    A Recap of the Data Engineering Open Forum at Netflix

    The first Data Engineering Open Forum at Netflix gathered data engineers to discuss modern developments, challenges, and future prospects in the field. Highlights included talks on machine learning-powered auto remediation for Netflix's big data platform, employing generative AI for enterprise data modeling, managing real-time data delivery, building data platforms post-GDPR, unbundling data warehouses, evolving data quality strategies at Airbnb, and enhancing data productivity with SQLMesh.

  5. 5
    Article
    Avatar of tdsTowards Data Science·2y

    Data Engineering, Redefined

    The post argues for a redefinition of data engineering, separating it from the implementation of business logic, which should remain the domain of application developers. It highlights how current practices create brittle and uncoordinated data pipelines and proposes focusing data engineering on the movement, manipulation, and management of data in a technical sense. A call is made for clearer separation between business logic and data manipulation to improve software quality and maintainability.

  6. 6
    Article
    Avatar of tdsTowards Data Science·2y

    Back to Basics: Databases, SQL, and Other Data-Processing Must-Reads

    Relational databases and SQL queries remain vital for daily workflows of data professionals, despite the buzz around LLMs. This post highlights essential reads on maintaining and growing skills in data and ML tasks, emphasizing the interconnectedness of foundational data operations and advanced AI tasks. Featured topics include simplifying Python code for data engineering, learning SQL for data analytics, using pivot tables in SQL, managing Excel charts with VBA, and turning relational databases into graph databases.