We may earn an affiliate commission when you visit our partners.
Course image
Udemy logo

Apache Airflow

The Hands-On Guide

Marc Lamberti

Apache Airflow is a platform created by the community to programmatically author, schedule and monitor workflows.

It is scalable, dynamic, extensible, and modulable.

Read more

Apache Airflow is a platform created by the community to programmatically author, schedule and monitor workflows.

It is scalable, dynamic, extensible, and modulable.

Without any doubt, mastering Airflow is becoming a must-have and an attractive skill for anyone working with data.

What you will learn in the course:

  • Fundamentals of Airflow are explained such as what Airflow is, how the scheduler and the web server work

  • The Forex Data Pipeline project is an incredible way to discover many operators in Airflow and deal with Slack, Spark, Hadoop, and more

  • Mastering your DAGs is a top priority, and you can play with timezones, unit test your DAGs, structure your DAG folder, and much more.

  • Scaling Airflow through different executors such as the Local Executor, the Celery Executor, and the Kubernetes Executor will be explained in detail. You will discover how to specialize your workers, add new workers, and what happens when a node crashes.

  • A Kubernetes cluster of 3 nodes will be set up with Rancher, Airflow, and the Kubernetes Executor local to run your data pipelines.

  • Advanced concepts will be shown through practical examples such as templating your DAGs, how to make your DAG dependent on another, what are Subdags and deadlocks, and more.

  • You will set up a Kubernetes cluster in the cloud with AWS EKS and Rancher to use Airflow and the Kubernetes Executor.

  • Monitoring Airflow is extremely important. That's why you will know how to do it with Elasticsearch and Grafana.

  • Security will also be addressed to make your Airflow instance compliant with your company. Specifying roles and permissions for your users with RBAC, preventing them from accessing the Airflow UI with authentication and password,  data encryption, and more.

In addition:

  • Many practical exercises are given along the course so that you will have occasions to apply what you learn.

  • Best practices are stated when needed to give you the best ways of using Airflow.

  • Quiz are available to assess your comprehension at the end of each section.

  • Answering your questions fast is my top priority, and I will do my best for you.

I put a lot of effort into giving you the best content, and I hope you will enjoy it as much as I wanted to do it.

At the end of the course, you will be more confident than ever in using Airflow.

I wish you a great success.

Marc Lamberti

Enroll now

What's inside

Learning objectives

  • Coding production grade data pipelines by mastering airflow through hands-on examples
  • How to follow best practices with apache airflow
  • How to scale airflow with the local, celery and kubernetes wxecutors
  • How to set up monitoring with elasticsearch and grafana
  • How to secure airflow with authentication, crypto and the rbac ui
  • Core and advanced concepts with pros and limitations
  • Mastering dags with timezones, unit testing, backfill and catchup
  • Organising the dag folder and keep things clean

Syllabus

Introduction
Important Prerequisites
The Roadmap
Who I am?
Read more
Development Environment
The basics of Apache Airflow
Why Airflow?
What is Airflow?
How Airflow works?
The little secret of the webserver and the scheduler
[Practice] Installing Airflow
[Practice] Quick tour of Airflow UI
[Practice] Quick tour of Airflow CLI
Quick side note
The Forex Data Pipeline
Docker reminder
Docker performances
Project: The Forex Data Pipeline
A bit more about the architecture
What is a DAG?
[Practice] Define your DAG
What is an Operator?
[Practice] Check if the API is available - HttpSensor
[Practice] Check if the currency file is available - FileSensor
[Practice] Download the forex rates from the API - PythonOperator
[Practice] Save the forex rates into HDFS - BashOperator
[Practice] Create the Hive table forex_rates - HiveOperator
[Practice] Process the forex rates with Spark - SparkSubmitOperator
[Practice] Send email notifications - EmailOperator
[Practice] Send Slack notifications - SlackWebhookOperator
[Practice] Add dependencies between tasks
[Practice] The Forex Data Pipeline in action!
Congratulations!
Mastering your DAGs
Start_date and schedule_interval parameters demystified
[Practice] Manipulating the start_date with schedule_interval
Backfill and Catchup
[Practice] Catching up non triggered DAGRuns
Dealing with timezones in Airflow
[Practice] Making your DAGs timezone aware
How to make your tasks dependent
[Practice] Creating task dependencies between DagRuns
How to structure your DAG folder
[Practice] Organizing your DAGs folder
[Practice] How the Web Server works
How to deal with failures in your DAGs
[Practice] Retry and Alerting
How to test your DAGs
[Practice] Unit testing your DAGs
Improving your DAGs with advanced concepts
Minimising Repetitive Patterns With SubDAGs
[Practice] Grouping your tasks with SubDAGs and Deadlocks
Making different paths in your DAGs with Branching
[Practice] Make Your First Conditional Task Using Branching
Trigger rules for your tasks
[Practice] Changing how your tasks are triggered
Avoid hard coding values with Variables, Macros and Templates
[Practice] Templating your tasks
How to share data between your tasks with XCOMs
[Practice] Sharing (big?) data with XCOMs
TriggerDagRunOperator or when your DAG controls another DAG
[Practice] Trigger a DAG from another DAG
Dependencies between your DAGs with the ExternalTaskSensor
[Practice] Make your DAGs dependent with the ExternalTaskSensor
Distributing Apache Airflow
Sequential Executor with SQLite
Local Executor with PostgreSQL
[Practice] Executing tasks in parallel with the Local Executor
[Practice] Ad Hoc Queries with the metadata database
Scale out Apache Airflow with Celery Executors and Redis
[Practice] Set up the Airflow cluster with Celery Executors and Docker
[Practice] Distributing your tasks with the Celery Executor
[Practice] Adding new worker nodes with the Celery Executor
[Practice] Sending tasks to a specific worker with Queues
[Practice] Pools and priority_weights: Limiting parallelism - prioritizing tasks
Kubernetes Reminder
Scaling Airflow with Kubernetes Executors
[Practice] Set up a 3 nodes Kubernetes Cluster with Vagrant and Rancher
[Practice] Installing Airflow with Rancher and the Kubernetes Executor
[Practice] Running your DAGs with the Kubernetes Executor
Deploying Airflow on AWS EKS with Kubernetes Executors and Rancher
Quick overview of AWS EKS
[Practice] Set up an EC2 instance for Rancher
[Practice] Create an IAM User with permissions
[Practice] Create an ECR repository
[Practice] Create an EKS cluster with Rancher
How to access your applications from the outside
[Practice] Deploy Nginx Ingress with Catalogs (Helm)
[Practice] Deploy and run Airflow with the Kubernetes Executor on EKS
[Practice] Cleaning your AWS services
Monitoring Apache Airflow
How the logging system works in Airflow
[Practice] Setting up custom logging

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Teaches Apache Airflow, which is an in-demand skill for data professionals
Covers core and advanced concepts of Airflow, providing a comprehensive foundation
Provides practical examples and hands-on exercises to reinforce learning
Explores best practices for using Airflow, ensuring efficient and effective data pipelines
Offers a detailed guide to setting up and running Airflow on Kubernetes, enabling scalability and flexibility
Provides instruction on security measures for Airflow, ensuring data privacy and compliance

Save this course

Save Apache Airflow: The Hands-On Guide to your list so you can find it easily later:
Save

Reviews summary

Well-structured and clear airflow course

Learners say this course is one of the best Airflow courses on Udemy and it is very well structured with clear and precise explanations. They say they learned a lot from the course.
The course is very well structured.
"This is definitely the best Airflow course on Udemy. It is well structured, very clear and precise explanation."
The course has clear and precise explanations.
"It is well structured, very clear and precise explanation."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Apache Airflow: The Hands-On Guide with these activities:
Explore Airflow Best Practices
Become familiar with Airflow best practices to improve your workflow efficiency.
Browse courses on Airflow
Show steps
  • Review official Airflow documentation on best practices
  • Explore community forums and articles to gather additional insights
Run a data pipeline with the Local Executor
Run a data pipeline with the 'Local Executor' to familiarize yourself with its functionality and execution.
Browse courses on Data Pipeline
Show steps
  • Install the Airflow CLI if not already installed
  • Run 'airflow initdb' to initialize the Airflow database
  • Start the Airflow webserver and scheduler with 'airflow webserver' and 'airflow scheduler'
  • Run 'airflow example_dags run_example_dags' to create and run example DAGs
Collaborative Airflow Pipeline Design
Engage with peers to share knowledge, troubleshoot, and refine your Airflow pipeline designs.
Browse courses on Airflow
Show steps
  • Join an online community or forum dedicated to Airflow
  • Participate in discussions and ask questions to gain insights
  • Collaborate on pipeline designs and provide feedback to others
Three other activities
Expand to see all activities and additional details
Show all six activities
Hands-on Airflow Dataset Transformation
Enhance your practical skills by working on a hands-on project involving data transformation using Airflow.
Browse courses on Airflow
Show steps
  • Identify a suitable dataset for transformation
  • Design and implement an Airflow workflow for the transformation
  • Test and refine your workflow to ensure accuracy and efficiency
Configure Airflow with the Kubernetes Executor
Configure Airflow with the 'Kubernetes Executor' to learn about distributed task execution and container orchestration.
Show steps
  • Install Kubernetes on a server or cluster
  • Deploy Airflow with the Kubernetes Executor using Helm or Docker Compose
  • Create and run a DAG that uses the Kubernetes Executor
  • Monitor your DAGs and Kubernetes cluster
Personal Data Pipeline Project
Solidify your understanding by building your own data pipeline using Airflow.
Browse courses on Airflow
Show steps
  • Choose a project idea that aligns with your learning objectives
  • Design and implement the pipeline using Airflow operators and sensors
  • Monitor and maintain your pipeline to ensure its reliability and efficiency

Career center

Learners who complete Apache Airflow: The Hands-On Guide will develop knowledge and skills that may be useful to these careers:
Data Engineer
Data Engineers are responsible for building and maintaining large-scale data pipelines and data infrastructure. The Apache Airflow: The Hands-On Guide course provides you with the skills and knowledge needed to develop and deploy production-grade data pipelines using Apache Airflow. This course covers everything from the basics of Airflow to advanced topics such as scaling, monitoring, and security. With this training, you'll be well-equipped to take this role.
Data Analyst
Data Analysts use data to solve business problems and make better decisions. The Apache Airflow: The Hands-On Guide course can help you develop the skills and knowledge needed to work with data at scale. This course covers topics such as data engineering, data warehousing, and data visualization. With this training, you'll be well-positioned to contribute to data-driven decision making in your organization.
Machine Learning Engineer
Machine Learning Engineers build and deploy machine learning models. The Apache Airflow: The Hands-On Guide course can help data professionals to get started with deploying machine learning models into production. This course covers topics such as data engineering, model deployment, and monitoring. With this training, you'll be well-prepared to use Airflow to automate the machine learning lifecycle.
Cloud Architect
Cloud Architects design and manage cloud computing solutions. The Apache Airflow: The Hands-On Guide course can provide you with the skills and knowledge needed to build and manage data pipelines in the cloud. This course covers topics such as cloud computing, data engineering, and data warehousing. With this training, you'll be well-equipped to take on this role.
Software Engineer
Software Engineers design, develop, and maintain software systems. The Apache Airflow: The Hands-On Guide course can help you develop the skills and knowledge needed to build and maintain data pipelines. This course covers topics such as software development, data engineering, and data warehousing. With this training, you'll be well-prepared to contribute to the development of data-driven applications.
Database Administrator
Database Administrators manage and maintain databases. The Apache Airflow: The Hands-On Guide course can help you develop the skills and knowledge needed to work with data at scale. This course covers topics such as data engineering, data warehousing, and data visualization. With this training, you'll be well-positioned to manage and maintain data pipelines.
Data Scientist
Data Scientists use data to solve business problems and make better decisions. The Apache Airflow: The Hands-On Guide course can help you develop the skills and knowledge needed to work with data at scale. This course covers topics such as data engineering, data warehousing, and data visualization. With this training, you'll be well-prepared to contribute to data-driven decision making in your organization.
Business Analyst
Business Analysts use data to solve business problems and make better decisions. The Apache Airflow: The Hands-On Guide course can help you develop the skills and knowledge needed to work with data at scale. This course covers topics such as data engineering, data warehousing, and data visualization. With this training, you'll be well-positioned to contribute to data-driven decision making in your organization.
Project Manager
Project Managers plan and execute projects. The Apache Airflow: The Hands-On Guide course can help you develop the skills and knowledge needed to manage data pipeline projects. This course covers topics such as project management, data engineering, and data warehousing. With this training, you'll be well-prepared to lead data pipeline projects to success.
Data Architect
Data Architects design and manage data systems. The Apache Airflow: The Hands-On Guide course can help you develop the skills and knowledge needed to design and manage data pipelines. This course covers topics such as data architecture, data engineering, and data warehousing. With this training, you'll be well-prepared to take on this role.
Data Warehouse Engineer
Data Warehouse Engineers build and maintain data warehouses. The Apache Airflow: The Hands-On Guide course can help you develop the skills and knowledge needed to build and maintain data pipelines for data warehouses. This course covers topics such as data engineering, data warehousing, and data visualization. With this training, you'll be well-prepared to take on this role.
Data Visualization Engineer
Data Visualization Engineers create visual representations of data. The Apache Airflow: The Hands-On Guide course can help you develop the skills and knowledge needed to build and maintain data pipelines for data visualization. This course covers topics such as data engineering, data warehousing, and data visualization. With this training, you'll be well-prepared to take on this role.
DevOps Engineer
DevOps Engineers automate and manage the software development process. The Apache Airflow: The Hands-On Guide course can help you develop the skills and knowledge needed to automate and manage data pipelines. This course covers topics such as DevOps, data engineering, and data warehousing. With this training, you'll be well-prepared to take on this role.
Systems Analyst
Systems Analysts design and implement computer systems. The Apache Airflow: The Hands-On Guide course can help you develop the skills and knowledge needed to design and implement data pipelines. This course covers topics such as systems analysis, data engineering, and data warehousing. With this training, you'll be well-prepared to take on this role.
Network Administrator
Network Administrators manage and maintain computer networks. The Apache Airflow: The Hands-On Guide course may help you develop the skills and knowledge needed to manage and maintain data pipelines. This course covers topics such as networking, data engineering, and data warehousing. With this training, you'll be well-prepared to take on this role.

Reading list

We've selected seven books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Apache Airflow: The Hands-On Guide.
Foundational resource for building large-scale, reliable, maintainable, and scalable data systems.
Provides a deep dive into Kubernetes and its features, and includes practical examples and exercises to help you get started with managing containerized applications.
Comprehensive guide to Elasticsearch, and includes chapters on installation, configuration, indexing, searching, and analysis.
Comprehensive guide to site reliability engineering (SRE), and includes chapters on SRE principles, practices, and tools.
Comprehensive guide to Spark, and includes chapters on installation, configuration, programming, and administration.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Apache Airflow: The Hands-On Guide.
Apache Airflow on AWS EKS: The Hands-On Guide
Most relevant
Introduction to Airflow
Most relevant
Automate Data Pipelines
Most relevant
Workflow Orchestration with Google Cloud Composer
Most relevant
Productionalizing Data Pipelines with Apache Airflow 1
Most relevant
The Complete Hands-On Introduction to Apache Airflow
Kubernetes for the Absolute Beginners - Hands-on
Building ETL and Data Pipelines with Bash, Airflow and...
Causal Diagrams: Draw Your Assumptions Before Your...
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser