We may earn an affiliate commission when you visit our partners.
Course image
Rav Ahuja, Yan Luo, and Jeff Grossman

Well-designed and automated data pipelines and ETL processes are the foundation of a successful Business Intelligence platform. Defining your data workflows, pipelines and processes early in the platform design ensures the right raw data is collected, transformed and loaded into desired storage layers and available for processing and analysis as and when required.

This course is designed to provide you the critical knowledge and skills needed by Data Engineers and Data Warehousing specialists to create and manage ETL, ELT, and data pipeline processes.

Read more

Well-designed and automated data pipelines and ETL processes are the foundation of a successful Business Intelligence platform. Defining your data workflows, pipelines and processes early in the platform design ensures the right raw data is collected, transformed and loaded into desired storage layers and available for processing and analysis as and when required.

This course is designed to provide you the critical knowledge and skills needed by Data Engineers and Data Warehousing specialists to create and manage ETL, ELT, and data pipeline processes.

Upon completing this course you’ll gain a solid understanding of Extract, Transform, Load (ETL), and Extract, Load, and Transform (ELT) processes; practice extracting data, transforming data, and loading transformed data into a staging area; create an ETL data pipeline using Bash shell-scripting, build a batch ETL workflow using Apache Airflow and build a streaming data pipeline using Apache Kafka.

You’ll gain hands-on experience with practice labs throughout the course and work on a real-world inspired project to build data pipelines using several technologies that can be added to your portfolio and demonstrate your ability to perform as a Data Engineer.

This course pre-requisites that you have prior skills to work with datasets, SQL, relational databases, and Bash shell scripts.

What's inside

Learning objectives

  • Describe and differntiate between extract, transform, load (etl) and extract, load, transform (elt) processes
  • Define data pipeline components, processes, tools and technologies
  • Create etl processes using bash shell scripts
  • Develop batch data pipelines using apache airflow
  • Create streaming data pipelines using apache kafka

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Prerequisites necessitate some background knowledge
Enhances a strong foundation for intermediate learners
Develops in-demand and highly relevant industry skills
Introduces tools and software that are commonly used in the industry
Provides experience with hands-on practice labs
Offers opportunities to build a portfolio of real-world projects

Save this course

Save Building ETL and Data Pipelines with Bash, Airflow and Kafka to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Building ETL and Data Pipelines with Bash, Airflow and Kafka with these activities:
Read articles on data pipeline trends and innovations
Stay up-to-date on the latest advancements and trends in the field of data pipelines.
Browse courses on Data Pipelines
Show steps
  • Identify industry blogs, news websites, or research papers on data pipeline trends and innovations
  • Read and summarize key articles to gain insights into emerging technologies and best practices
  • Discuss your findings with peers or mentors
Review SQL and relational database concepts
Refresh your understanding of SQL and relational databases, which are essential for data pipeline operations.
Browse courses on SQL
Show steps
  • Review online tutorials or books on SQL and relational database concepts
  • Practice writing SQL queries and designing database schemas
  • Complete practice exercises or quizzes to test your understanding
Gather resources on data pipeline best practices
Enhance your data pipeline knowledge by compiling resources and tools on industry best practices.
Browse courses on Data Pipelines
Show steps
  • Search for articles, whitepapers, and online resources on data pipeline best practices
  • Organize and categorize the resources in a central location, such as a document or shared folder
  • Review and analyze the resources to identify common patterns and recommendations
Four other activities
Expand to see all activities and additional details
Show all seven activities
Participate in a study group or online forum
Engage with peers to discuss concepts, share experiences, and reinforce your understanding of data pipelines.
Browse courses on Data Pipelines
Show steps
  • Join a study group or online forum dedicated to data pipelines
  • Participate in discussions, ask questions, and share your knowledge
  • Collaborate on projects or exercises with other group members
Follow online tutorials on data pipeline tools
Expand your knowledge of data pipeline tools and technologies by following guided tutorials.
Browse courses on Apache Airflow
Show steps
  • Identify online tutorials on Apache Airflow, Apache Kafka, or Bash shell scripting
  • Follow the tutorials step-by-step, practicing the concepts and techniques
  • Experiment with the tools and apply them to small data pipeline projects
Work through coding challenges
Sharpen your data engineering skills by solving coding challenges related to data pipeline creation and management.
Browse courses on Data Pipelines
Show steps
  • Identify a coding challenge platform or website
  • Select challenges related to ETL or ELT processes
  • Work through the challenges, debugging and optimizing your code
Build a data pipeline using multiple technologies
Demonstrate your proficiency in data pipeline creation by building a project that incorporates multiple technologies.
Browse courses on Data Pipelines
Show steps
  • Define the scope and requirements of your data pipeline project
  • Select and install the necessary technologies, such as Apache Airflow, Apache Kafka, and Bash shell scripting
  • Design and implement the data pipeline, including data extraction, transformation, and loading
  • Test and evaluate the performance of your data pipeline
  • Document your project and share it with others

Career center

Learners who complete Building ETL and Data Pipelines with Bash, Airflow and Kafka will develop knowledge and skills that may be useful to these careers:
Data Engineer
As a Data Engineer, you will be responsible for designing, building, and maintaining data pipelines. This course can help you gain the knowledge and skills required to successfully perform these tasks. You will learn about the different components and processes involved in data pipelines, and you will have the opportunity to practice creating your own pipelines using Bash shell scripts, Apache Airflow, and Apache Kafka.
Data Analyst
Data Analysts use data to make informed decisions. This course can help you develop the skills you need to extract, transform, and analyze data. You will also have the opportunity to gain hands-on experience with Apache Airflow and Apache Kafka, which are widely used in the data analytics industry.
Data Scientist
Data Scientists use data to build predictive models and solve business problems. This course can help you develop the skills you need to extract, transform, and analyze data in order to build machine learning models. You will also have the opportunity to gain hands-on experience with Apache Airflow and Apache Kafka, which are becoming increasingly important in the data science field.
Business Analyst
Business Analysts use data to understand business needs and make recommendations for improvement. This course can help you develop the skills you need to extract, transform, and analyze data in order to make informed business decisions.
Database Administrator
Database Administrators ensure that databases are running smoothly and efficiently. This course can help you develop the skills you need to extract, transform, and load data into databases.
Software Engineer
Software Engineers design, develop, and maintain software applications. This course can help you develop the skills you need to extract, transform, and load data into software applications.
Data Architect
Data Architects design and manage data systems. This course can help you develop the skills you need to create and manage data pipelines that meet the needs of your organization.
Data Governance Specialist
Data Governance Specialists ensure that data is used in a compliant and ethical manner. This course can help you develop the skills you need to extract, transform, and load data in a way that meets your organization's data governance policies.
IT Manager
IT Managers oversee the IT systems and infrastructure of an organization. This course can help you develop the skills you need to manage data pipelines and ensure that they are running smoothly and efficiently.
Project Manager
Project Managers plan and execute projects. This course can help you develop the skills you need to manage data pipeline projects and ensure that they are delivered on time and within budget.
Quality Assurance Analyst
Quality Assurance Analysts test and verify that software applications are working correctly. This course can help you develop the skills you need to test and verify data pipelines and ensure that they are producing accurate and reliable data.
Business Intelligence Analyst
Business Intelligence Analysts use data to make informed decisions. This course can help you develop the skills you need to extract, transform, and analyze data in order to create business intelligence reports and dashboards.
Data Warehouse Engineer
Data Warehouse Engineers design and build data warehouses. This course can help you develop the skills you need to extract, transform, and load data into data warehouses.
Software Developer
Software Developers design, develop, and maintain software applications. This course can help you develop the skills you need to extract, transform, and load data into software applications.

Reading list

We've selected six books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Building ETL and Data Pipelines with Bash, Airflow and Kafka.
While not directly about ETL, this book is an excellent resource for data engineering as a whole. A must-have for any data engineer's bookshelf.
Provides a comprehensive overview of the design and implementation of data-intensive applications. It covers topics such as data modeling, data storage, and data processing.
For those looking to build a data warehouse, this book classic. A highly recommended read.
Provides a comprehensive overview of data-intensive text processing using MapReduce, a programming model for processing large datasets. It covers topics such as text mining, natural language processing, and machine learning.
Provides a comprehensive overview of deep learning, a subfield of machine learning that is used to train models for tasks such as image recognition and natural language processing. It covers the basics of deep learning, as well as advanced topics such as convolutional neural networks and recurrent neural networks.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Building ETL and Data Pipelines with Bash, Airflow and Kafka.
Building Batch Data Pipelines on Google Cloud
Most relevant
ETL and Data Pipelines with Shell, Airflow and Kafka
Most relevant
Building Batch Data Pipelines on Google Cloud
Most relevant
Building Your First ETL Pipeline Using Azure Databricks
Most relevant
The Path to Insights: Data Models and Pipelines
Most relevant
Designing SSIS Integration Solutions
Most relevant
Extracting and Transforming Data in SSIS
Most relevant
Extract, Transform & Load using Python
Most relevant
Apache Spark for Data Engineering and Machine Learning
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser