Best of Data EngineeringMay 2024

  1. 1
    Article
    Avatar of communityCommunity Picks·2y

    The State of Data Engineering 2024

    The 2024 State of Data Engineering report discusses the influence of GenAI on software infrastructure, the expansion of product offerings due to the economic downturn, and the impact of open table formats and their catalogs in the data lake industry. It also highlights the importance of data version control and observability in AI/ML systems.

  2. 2
    Article
    Avatar of kdnuggetsKDnuggets·2y

    10 GitHub Repositories to Master Data Engineering

    Learn data engineering through free courses, tutorials, books, tools, guides, roadmaps, practice exercises, projects, and other resources.

  3. 3
    Article
    Avatar of inPlainEngHQPython in Plain English·2y

    Creating an ETL Data Pipeline Using Bash with Apache Airflow

    Learn how to create an ETL data pipeline using bash with Apache Airflow. Extract data from various file formats, transform it, and load it into a new file. Includes steps for starting Apache Airflow, downloading the dataset, creating a DAG, and executing the pipeline.

  4. 4
    Article
    Avatar of medium_jsMedium·2y

    All you need to know about the Google File System

    The Google File System (GFS) is a distributed large-scale file system designed by Google. It is built to handle component failures, store large files, and optimize performance for file appending. The system features a single master, multiple chunkservers, and multiple clients. GFS uses chunk replication, lease management, and checksumming to ensure high availability and data integrity.

  5. 5
    Article
    Avatar of substackSubstack·2y

    How to learn data engineering more effectively

    Learn the principles of learning for a successful career in data engineering, find communities of other learners, and discover techniques to learn better.

  6. 6
    Article
    Avatar of substackSubstack·2y

    Why it's hard for data engineers to get promoted after senior engineer

    Barriers and obstacles for data engineers to get promoted after senior engineer, lack of visibility compared to other data roles, being viewed as 'less technical' at some companies, smaller teams for data engineering compared to software engineering, and the top out of the data engineering individual contributor track at data architect.