ETL and Data Pipelines with Shell, Airflow and Kafka from Coursera

Delve into the two different approaches to converting raw data into analytics-ready data. One approach is the Extract, Transform, Load (ETL) process. The other contrasting approach is the Extract, Load, and Transform (ELT) process. ETL processes apply to data warehouses and data marts. ELT processes apply to data lakes, where the data is transformed on demand by the requesting/calling application.

In this course, you will learn about the different tools and techniques that are used with ETL and Data pipelines. Both ETL and ELT extract data from source systems, move the data through the data pipeline, and store the data in destination systems. During this course, you will experience how ELT and ETL processing differ and identify use cases for both. You will identify methods and tools used for extracting the data, merging extracted data either logically or physically, and for loading data into data repositories.

You will also define transformations to apply to source data to make the data credible, contextual, and accessible to data users. You will be able to outline some of the multiple methods for loading data into the destination system, verifying data quality, monitoring load failures, and the use of recovery mechanisms in case of failure.

By the end of this course, you will also know how to use Apache Airflow to build data pipelines as well be knowledgeable about the advantages of using this approach. You will also learn how to use Apache Kafka to build streaming pipelines as well as the core components of Kafka which include: brokers, topics, partitions, replications, producers, and consumers.

Finally, you will complete a shareable final project that enables you to demonstrate the skills you acquired in each module.

What's inside

Syllabus

Data Processing Techniques

ETL or Extract, Transform, and Load processes are used for cases where flexibility, speed, and scalability of data are important. You will explore some key differences been similar processes, ETL and ELT, which include the place of transformation, flexibility, Big Data support, and time-to-insight. You will learn that there is an increasing demand for access to raw data that drives the evolution from ETL to ELT. Data extraction involves advanced technologies including database querying, web scraping, and APIs. You will also learn that data transformation is about formatting data to suit the application and that data is loaded in batches or streamed continuously.

Traffic lights

Read about what's good

what should give you pause

and possible dealbreakers

Examines ETL and ELT processes, which is standard in data analytics

Teaches both ETL and ELT, which helps learners gain skills in both approaches

Develops foundational theory and practical skills in ETL and Data Pipelines

Taught by instructors recognized for their work in data management

Uses real-world scenarios for hands-on labs in ETL and streaming pipelines

Reviews summary

Etl and data pipelines with shell, airflow, and kafka

Learners say this hands-on course teaches Apache Airflow, Apache Kafka, and ETL pipelining for data engineers and developers. There are positive comments about the labs and assignments which learners say are practical and helpful. However, learners mention some of the lectures seem too basic, or even boring. Some reviews also mention there are occasional technical issues with the course's labs.

This course emphasizes hands-on experience, providing learners with practical labs and assignments.

"Labs in this course are very helpful and to the point."

"Amazing for beginners to this subject! The labs are super useful and everything is explained in a really nice way."

"The final project to connect Airflow as a pipeline management tool to Kafka server is a very useful hands-on project."

Some learners found the lectures to be too basic or boring, suggesting that the course may not be suitable for experienced learners.

"As with all these IBM courses this one is super boring. Robot voice talking over powerpoints, as usual."

"The course material was basic so make sure do to a lot of your own additional learning outside of the coureswork."

"Week 1 feels useless because the main idea is to learn about Airflow and Kafka, and all this information about ETL it is not relevant if the course is positioned as an advanced one."

Some learners encountered technical issues with the course's labs, which may have hindered their experience.

"Buggy practice. Not possible to complete without fixing airflow start script yourself."

"The lab exercises were not loaded, so I had to move to the next section and it was not understandable, there is a technical issue!"

"I cannot proceed with the "SUBMIT a DAG" lab as I am constantly being shown the error - "cp: cannot create regular file '/home/project/airflow/dags/my_first_dag.py': Permission denied" when I run the command - "cp my_first_dag.py $AIRFLOW_HOME/dags"."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in ETL and Data Pipelines with Shell, Airflow and Kafka with these activities:

Use our data modeling tutorial

Show steps

The course covers database querying and data modeling so this refresher will prepare you in advance.

Browse courses on Data Modeling

Show steps

Visit the tutorial and read the introductory sections
Work your way through the exercises and sample problems
Review the sample solutions

Review concepts of data extraction, transformation, and loading

Show steps

Reviewing these concepts will provide a strong foundation for understanding the course material.

Browse courses on Data Extraction

Show steps

Read course syllabus and skim assigned textbooks
Review notes or materials from previous courses on data management

Run SQL queries against different databases

Show steps

SQL is covered heavily in this course so these practice drills will improve your results

Browse courses on Data Extraction

Show steps

Load data into a local database
Write SQL queries to retrieve specific sets of data
Practice aggregating, sorting, and filtering data

Eight other activities

Expand to see all activities and additional details

Show all 11 activities

Create a study guide or summary of key concepts

Show steps

Creating a study guide will help consolidate learning and improve retention.

Browse courses on ETL

Show steps

Review course notes, readings, and assignments
Identify and summarize important concepts

Follow online tutorials or workshops to learn specific data analysis techniques

Show steps

Following tutorials and workshops will provide additional guidance and support in learning data analysis techniques.

Browse courses on Data Analysis Techniques

Show steps

Identify tutorials or workshops that focus on specific techniques you want to learn
Complete the tutorials or participate in the workshops

Practice data transformation techniques

Show steps

Practicing data transformation techniques will enhance understanding and proficiency.

Browse courses on Data Manipulation

Show steps

Complete practice exercises provided in the course modules
Find additional practice problems online or in textbooks

Attend industry meetups or conferences related to data engineering

Show steps

Attending industry events will provide opportunities to connect with professionals and learn about current trends.

Show steps

Research and identify relevant events
Register and attend the events

Create a data transformation pipeline using Python or another programming language

Show steps

Creating a data transformation pipeline will provide hands-on experience and reinforce learning.

Browse courses on Python Programming

Show steps

Choose a dataset and define the transformation rules
Write code to implement the transformations
Test and evaluate the pipeline

Volunteer at a non-profit organization that utilizes data analytics

Show steps

Volunteering will provide practical experience and exposure to real-world data analysis applications.

Browse courses on Data Analytics

Show steps

Identify organizations that align with your interests
Contact the organizations and inquire about volunteer opportunities

Develop a personal data analytics project

Show steps

Working on a personal project will allow you to apply your skills and explore your interests in data analysis.

Show steps

Identify a problem or opportunity that you want to address
Gather and analyze data
Develop and implement a solution

Create a presentation or report that showcases your data analysis findings

Show steps

Creating a deliverable will provide a structured way to communicate your analysis and insights.

Browse courses on Presentation

Show steps

Organize and analyze your data
Develop a clear and concise message

Career center

Learners who complete ETL and Data Pipelines with Shell, Airflow and Kafka will develop knowledge and skills that may be useful to these careers:

Data Engineer

Data Engineers build and maintain the infrastructure that supports data-driven applications and products. They design, build, test, deploy, maintain, and monitor data pipelines and data platforms. In this course, you will learn about data pipelines, data integration, and data cleansing. These concepts will help you to design and implement robust and scalable data engineering systems.

See salaries and explore the career path for Data Engineer

Data Warehouse Engineer

Data Warehouse Engineers design, develop, and maintain data warehouses. They are responsible for ensuring that the data in the data warehouse is accurate, consistent, and accessible to users. This course provides a foundation in ETL and data pipelines, which are essential skills for Data Warehouse Engineers.

See salaries and explore the career path for Data Warehouse Engineer

Data Integration Architect

Data Integration Architects design and implement data integration solutions. They work with stakeholders to identify and understand their data integration needs. This course provides a foundation in data integration and data cleansing, which are essential skills for Data Integration Architects.

See salaries and explore the career path for Data Integration Architect

Data Analyst

Data Analysts collect, transform, and analyze data to provide insights that inform decision-making. In this course, you will learn about data extraction, transformation, and loading (ETL), as well as how to build data pipelines. These skills will be essential for collecting and preparing data for analysis.

See salaries and explore the career path for Data Analyst

Software Engineer

Software Engineers design, develop, test, and maintain software systems. In this course, you will learn about shell scripting, Apache Airflow, and Apache Kafka. These technologies are essential for building and managing data pipelines.

See salaries and explore the career path for Software Engineer

Data Migration Specialist

Data Migration Specialists migrate data from one system to another. They work with stakeholders to identify and understand their data migration needs. This course provides a foundation in data integration and data cleansing, which are essential skills for Data Migration Specialists.

See salaries and explore the career path for Data Migration Specialist

Data Architect

A Data Architect takes responsibility for the design of data management solutions that address business requirements. They consider the overall architecture of an organization's data landscape, which may include data lakes, data warehouses, and operational systems. This course provides a basis for the data management lifecycle. Concepts of data integration will also be useful for a Data Architect looking at how to handle data from diverse sources.

See salaries and explore the career path for Data Architect

Machine Learning Engineer

Machine Learning Engineers build and deploy machine learning models to solve complex problems in a variety of industries. This course provides a foundation in Apache Kafka, which is a popular open source event streaming platform. Event streaming is a key technology for building real-time machine learning applications.

See salaries and explore the career path for Machine Learning Engineer

Database Administrator

Database Administrators are responsible for the performance and security of databases. They design, implement, and maintain database systems. This course provides a foundation in data integration and data cleansing, which are essential skills for Database Administrators.

See salaries and explore the career path for Database Administrator

Data Scientist

Data Scientists use scientific methods, processes, algorithms, and systems to extract knowledge and insights from data in various forms, both structured and unstructured. This course provides a foundation in data integration and data cleansing, which are essential skills for Data Scientists.

See salaries and explore the career path for Data Scientist

Data Governance Specialist

Data Governance Specialists develop and implement data governance policies and procedures. They work with stakeholders to identify and understand their data governance needs. This course provides a foundation in data integration and data cleansing, which are essential skills for Data Governance Specialists.

See salaries and explore the career path for Data Governance Specialist

Information Architect

Information Architects design and implement information systems. They work with stakeholders to identify and understand their information needs. This course provides a foundation in data integration and data cleansing, which are essential skills for Information Architects.

See salaries and explore the career path for Information Architect

Business Analyst

Business Analysts work with stakeholders to identify and understand their business needs. They analyze data to identify opportunities for improvement and develop solutions to business problems. This course provides a foundation in data integration and data cleansing, which are essential skills for Business Analysts.

See salaries and explore the career path for Business Analyst

Data Quality Analyst

Data Quality Analysts ensure that data is accurate, consistent, and complete. They work with stakeholders to identify and understand their data quality needs. This course provides a foundation in data integration and data cleansing, which are essential skills for Data Quality Analysts.

See salaries and explore the career path for Data Quality Analyst

Data Privacy Analyst

Data Privacy Analysts ensure that data is used in a compliant and ethical manner. They work with stakeholders to identify and understand their data privacy needs. This course provides a foundation in data integration and data cleansing, which are essential skills for Data Privacy Analysts.

See salaries and explore the career path for Data Privacy Analyst