Learn how to create an ETL data pipeline using bash with Apache Airflow. Extract data from various file formats, transform it, and load it into a new file. Includes steps for starting Apache Airflow, downloading the dataset, creating a DAG, and executing the pipeline.
•8m read time• From python.plainenglish.io
Table of contents
Creating an ETL Data Pipeline Using Bash with Apache AirflowProject OverviewObjectivesStep 1: Starting Apache AirflowStep 2: Download the DatasetStep 3: Creating A DAGTask 1.0— Import librariesTask 1.1 — Define the DAG argumentsTask 1.2 — Define/Create the DAGTask 1.3 — Create a shell script to unzip the downloaded dataTask 1.4 — Update the shell script to add a command to extract data from csv fileTask 1.5 — Update the shell script to add a command to extract data from tsv fileTask 1.6 — Update the shell script to add a command to extract data from fixed-width fileTask 1.7 — Update the shell script to add a command to consolidate dataTask 1.8 - Update the shell script to add a command to Transform and load the dataTask 1.9 — Create a task extract_transform_load in the ETL_toll_data.py to call the shell script.Task 1.10 — Submit the DAGTask 1.11 — Pause/Unpause the DAGTask 1.12 — Monitor the DAGResourcesIn Plain English 🚀Sort: