We may earn an affiliate commission when you visit our partners.
Data Soup

Learn the latest in AWS Glue - And learn to use it with other AWS resources.

In this growing world of data and growing cloud computing, it is necessary to have the core competency in cloud ETL tool also.  AWS Glue come with the in built Spark support, Data Quality and data curation using Data brew. The top technology, finance and insurance companies like JPMC, Vanguard, BCBS, Amazon, Capital One, Capgemini, FINRA  and more are all using AWS Glue  to run their ETL on PetaBytes scale of data everyday.

Read more

Learn the latest in AWS Glue - And learn to use it with other AWS resources.

In this growing world of data and growing cloud computing, it is necessary to have the core competency in cloud ETL tool also.  AWS Glue come with the in built Spark support, Data Quality and data curation using Data brew. The top technology, finance and insurance companies like JPMC, Vanguard, BCBS, Amazon, Capital One, Capgemini, FINRA  and more are all using AWS Glue  to run their ETL on PetaBytes scale of data everyday.

AWS Glue provides server less and scalable ETL solution where scripts can be written in Python, Spark and currently using Ray. It also provides the visual drag and drop options to create the ETL pipelines. As now more and more companies are migrating to cloud it has caused an explosion in demand for this skill. With the mastery of AWS Glue, you now have the ability to quickly become one of the most knowledgeable people in the job market.

This course will teach the basics in AWS Glue Data Catalog, AWS Glue Studio, AWS resources such as Once we've done that we'll go through how to use the Glue Data Quality, Glue Streaming and Glue Data Brew ETL pipelines. All along the way you'll have multiple labs to create all the resources and ETL pipelines using AWS console and CloudFormation templates that you put you right into a real world situation where you need to use your new skills to solve a real problem.

If you're ready to jump into the data engineering world of AWS Glue, this is the course for you.

Enroll now

What's inside

Learning objectives

  • Understanding of aws glue data catalog and creating aws glue database, glue tables and crawlers
  • Using aws glue studio, creating the etl pipeline along with scheduled triggers, conditional triggers and glue workflow
  • Kms, iam role, sns, s3 and other associated aws resources associated with glue. understanding and creation of all the resources
  • Understanding of aws glue data quality and creating the associated glue etl pipeline
  • Understanding aws glue data brew , creating the recipe, project and job to curate the dataset
  • Understanding the aws glue streaming, creating the stream using the python shell job and load the stream using the spark streaming
  • Different ways aws glue job can fail and debugging the failure and fix
  • Creating the aws resources for aws glue pipeline using the aws console and cloudformation

Syllabus

Introduction
Course Overview
Glue Pipeline Resources (Section 2,3 and 5) Overview
Glue Resources Setup Part 1 - IAM, KMS, SNS
Read more

We will make sure that required buckets ,files, CloudFormation templates and IAM Roles are in place before we create our first Glue Job.

Traffic lights

Read about what's good
what should give you pause
and possible dealbreakers
Provides hands-on experience with AWS Glue Data Catalog, Studio, Data Quality, and Data Brew, which are essential components for modern data engineering workflows
Explores integration with other AWS resources like KMS, IAM Role, SNS, and S3, which is crucial for building secure and scalable ETL pipelines
Covers debugging techniques for common AWS Glue job failures, which is invaluable for maintaining reliable data pipelines in production environments
Uses CloudFormation templates to create AWS resources, which is a best practice for infrastructure as code and ensures consistency across environments
Requires familiarity with AWS services like S3, IAM, and CloudFormation, which may necessitate additional learning for those new to the AWS ecosystem
Focuses on using AWS Glue with Python and Spark, so learners without prior experience in these technologies may face a steeper learning curve

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.
Save

Reviews summary

Comprehensive aws glue hands-on masterclass

According to learners, this course provides a comprehensive and practical introduction to AWS Glue, covering key services and concepts essential for data engineering on AWS. Students particularly praise the numerous hands-on labs and practical demonstrations, finding them crucial for solidifying understanding and applying concepts effectively. The course is noted for its clear explanations and coverage of related AWS resources like IAM, S3, and CloudFormation, which helps integrate Glue into a broader AWS workflow. While some encountered minor challenges with environment setup, the overall consensus highlights the course's effectiveness in equipping learners with the skills needed to work with AWS Glue.
Instructor makes complex topics understandable.
"The instructor did a great job explaining complex topics in a clear and easy-to-understand manner."
"I found the explanations to be very clear and well-structured."
"Learning was easy because the concepts were broken down effectively."
Learn skills applicable to real-world jobs.
"This course focuses on practical skills that I can use immediately in my job."
"It felt like I was solving real-world data engineering problems."
"I learned how to apply AWS Glue effectively for typical use cases."
Includes Glue and related AWS services.
"It covers not just Glue but also relevant AWS services like S3, IAM, and CloudFormation which is very helpful."
"Appreciated learning how Glue integrates with other parts of AWS."
"The curriculum provides a good overview of the AWS ecosystem around Glue."
Practical exercises are key to learning.
"The course provides great hands-on labs that helped me understand the concepts by doing them."
"I really appreciated the labs, they made everything click and feel much more practical."
"Doing the exercises solidified my learning more than just watching lectures."
Initial setup can be challenging for some.
"I had some trouble setting up the environment correctly to run the labs."
"Getting the initial setup right required a bit of troubleshooting."
"Some steps in the environment setup could be more detailed or simplified."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in AWS Glue - The Complete Masterclass with these activities:
Review AWS Fundamentals
Reviewing AWS fundamentals will provide a solid foundation for understanding AWS Glue and its integration with other AWS services.
Show steps
  • Review the core AWS services like S3, IAM, and CloudWatch.
  • Understand basic cloud computing concepts.
  • Familiarize yourself with the AWS Management Console.
Brush up on Python and Spark
Practicing Python and Spark will enable you to write and debug AWS Glue ETL scripts more effectively.
Browse courses on Pyspark
Show steps
  • Practice writing basic Python scripts.
  • Learn the fundamentals of Apache Spark.
  • Explore PySpark for data manipulation.
Build a Simple ETL Pipeline with AWS Glue
Building a simple ETL pipeline will allow you to apply the concepts learned in the course and gain hands-on experience with AWS Glue.
Show steps
  • Set up an AWS account and configure the AWS CLI.
  • Create an S3 bucket to store your data.
  • Design and implement a basic ETL pipeline using AWS Glue Studio.
  • Monitor and debug your pipeline using AWS CloudWatch.
Four other activities
Expand to see all activities and additional details
Show all seven activities
Follow AWS Glue Tutorials
Following guided tutorials will provide step-by-step instructions and practical examples for using AWS Glue features.
Show steps
  • Find official AWS Glue tutorials on the AWS website.
  • Work through tutorials on creating crawlers and ETL jobs.
  • Experiment with different data sources and transformations.
Document Your AWS Glue Projects
Documenting your projects will reinforce your understanding of AWS Glue and help you share your knowledge with others.
Show steps
  • Create a README file for each AWS Glue project.
  • Describe the purpose, architecture, and implementation details of your project.
  • Include instructions on how to set up and run your project.
Contribute to AWS Glue Open Source Projects
Contributing to open source projects will allow you to collaborate with other developers and improve your AWS Glue skills.
Show steps
  • Find AWS Glue-related open source projects on GitHub.
  • Identify issues or features that you can contribute to.
  • Submit pull requests with your code changes.
Optimize an AWS Glue ETL Job for Performance
Optimizing an ETL job will provide practical experience in improving the efficiency and scalability of AWS Glue pipelines.
Show steps
  • Identify performance bottlenecks in an existing AWS Glue ETL job.
  • Experiment with different optimization techniques, such as partitioning and data filtering.
  • Measure the performance improvements and document your findings.

Career center

Learners who complete AWS Glue - The Complete Masterclass will develop knowledge and skills that may be useful to these careers:
ETL Developer
An ETL Developer specializes in designing, building, and maintaining ETL pipelines which extract, transform, and load data into data warehouses or data lakes. This course directly aligns with the work of an ETL developer; it provides hands on exercises with AWS Glue, a serverless ETL service. The course covers creation of ETL pipelines using AWS Glue Studio and how to manage data quality. It also covers use cases for Glue streaming. Anyone seeking to become an ETL developer will find this course useful due to its focus on cloud based ETL processes as well as debugging and failure resolution.
Data Engineer
A data engineer designs, builds, and maintains the infrastructure that allows for the collection, storage, processing, and analysis of large datasets. This course is ideal for an aspiring data engineer. The course provides practical experience with AWS Glue, a serverless ETL service which is commonly used by data engineers to build data pipelines and data lakes. The course covers key aspects of data engineering such as setting up data catalogs and crawlers. In addition, it covers data quality and data curation, which are critical for any successful data engineering project. This course provides hands-on experience using AWS Glue and related resources, which helps build a solid foundation for data engineering.
Data Integration Specialist
A data integration specialist is responsible for moving data between different systems. This course is directly applicable to the role of a data integration specialist. It gives practical experience with AWS Glue, a tool designed for ETL processes. This course also covers the AWS Glue Data Catalog, AWS Glue Studio, and how to create and manage ETL pipelines, which are all essential aspects of data integration. The course further covers how to integrate AWS Glue with other AWS resources. A data integration specialist will find this course useful.
Cloud Data Architect
A cloud data architect designs and oversees the implementation of data management systems in cloud environments. Aspiring cloud data architects may find this course beneficial, as it provides detailed exposure to AWS Glue, a crucial service for building data pipelines on the cloud. The course covers key topics including AWS Glue Data Catalog, AWS Glue Studio, and data quality. It also teaches how to integrate AWS Glue with other AWS resources, which is essential for building complex cloud-based data solutions. Familiarity with data quality, data curation and different ways AWS Glue jobs can fail helps a cloud data architect create robust and reliable data systems.
Analytics Engineer
An analytics engineer focuses on transforming raw data into usable formats for analysis and reporting. This course will be valuable for any analytics engineer. It provides a thorough overview of AWS Glue, a tool used for data transformation and pipeline creation. The course covers important skills such as data cataloging, setting up ETL pipelines within AWS Glue Studio, data quality checks, and data curation. An analytics engineer can use this knowledge to help produce reliable data sources for their organization. The practical skills provided in this course are key for the work of an analytics engineer.
Cloud Solutions Engineer
A cloud solutions engineer designs and implements cloud based solutions. This course will help those who seek to work as cloud solutions engineers by providing them with a solid grasp of AWS Glue, a core data integration service in AWS. The course will give you knowledge of data catalog, ETL pipeline creation, data quality management, and integration of Glue with other AWS resources. It allows a cloud solutions engineer to design comprehensive and effective data workflows within AWS. This course will help a cloud solutions engineer broaden the data processing aspect of their role.
Cloud Engineer
A cloud engineer is responsible for the design, implementation, and maintenance of cloud computing infrastructure. This course may be useful for those looking to become a cloud engineer. It provides practical experience with AWS Glue, which is often used in cloud environments for data processing. The hands-on labs on creating AWS resources, setting up IAM roles, and working with CloudFormation templates also contribute to essential cloud engineering skills. The course provides a foundation in how data is managed and moved within AWS, which is a frequent concern for cloud engineers.
Solutions Architect
A solutions architect designs and plans complex IT solutions that meet business needs. This course may be useful for a solutions architect looking to enhance their skills. It covers using AWS Glue to build data pipelines, which is often an important part of data intensive solutions. It also covers using other AWS resources with Glue. The course also provides familiarity with data quality and data curation, which are important aspects of many solutions. A solutions architect needs to be able to recommend solutions that integrate well with data systems, and understanding Glue helps in that goal.
Machine Learning Engineer
A machine learning engineer develops and deploys machine learning models. Aspiring machine learning engineers may find this course useful. It teaches how to use AWS Glue to prepare data for machine learning applications. It gives hands-on experience using AWS Glue Studio and data quality functions. Understanding these concepts is helpful in creating reliable datasets for building and training machine learning models. This course may be useful to a machine learning engineer seeking to expand their data engineering skills.
Data Analyst
A data analyst uses data to identify trends, provide insights, and support decision-making. This course may be helpful to a data analyst looking to learn more about how data is prepared, which happens before analysis. It provides an overview of how data is extracted, transformed and loaded and how it is managed using AWS Glue. The course also teaches AWS Glue Data Brew for data curation, and AWS Glue Data Quality features, both of which help a data analyst work with accurate and reliable data. The course provides direct experience with the processes that a data analyst should be familiar with to be successful.
Business Intelligence Developer
A business intelligence developer designs and creates systems and reports that help businesses make better decisions. This course may interest a business intelligence developer. It provides a foundation in how data is extracted, transformed, and loaded, particularly through AWS Glue, which is critical in building business intelligence solutions. The course also covers data quality and data curation using Data Brew, ensuring the accuracy of data used for reporting. A business intelligence developer can use the data pipelines created by Glue to populate their systems.
Database Administrator
A database administrator manages the performance, integrity, and security of databases. This course may be relevant for database administrators. It covers how AWS Glue manages data catalogs and crawlers to discover and organize data, which is a key function for database administration. The course also covers setting up database connections with the AWS Glue Data Catalog, which is helpful for a database administrator to understand. Though a database administrator may not use Glue directly, knowledge of how Glue functions helps them manage data efficiently.
Software Developer
A software developer designs, develops, and maintains software applications. This course may be helpful to a software developer who wants to expand skills in data processing, using AWS Glue. The course covers how to create and manage data pipelines specifically for AWS cloud environments, with hands-on labs that build experience. The course also covers using Python for writing ETL scripts, which helps to build a software developer's ability to create data driven applications. Software developers seeking to incorporate data management into their work may find this course useful.
System Administrator
A system administrator manages and maintains computer systems including cloud based systems. This course may be useful for a system administrator who wants to build skill with cloud data services. This course covers how AWS Glue is configured and maintained and gives a solid understanding of data pipelines, ETL processes, and the related AWS resources. It also provides information about how AWS Glue integrates with other AWS services, which is important for good cloud based system administration. The course helps a system administrator better maintain cloud based data systems.
Technical Project Manager
A technical project manager oversees the planning, execution, and delivery of technical projects. This course may be useful for technical project managers working in data intensive projects. It provides an overview of AWS Glue and how it is used to set up data integration and ETL processes. This course introduces AWS Glue components including Data Catalog, AWS Glue Studio, Data Quality, and Data Brew. This background helps a technical project manager understand the work that data engineers and ETL developers do, which aids in better project planning and management.

Reading list

We haven't picked any books for this reading list yet.
While this book has a broader scope than AWS Glue, it provides valuable insights into the concepts and best practices of big data integration.
This official guide from Amazon Web Services provides comprehensive documentation on AWS Glue, covering its features, architecture, and usage.
Classic work on dimensional modeling, a key aspect of data warehousing and ETL. It provides a comprehensive guide to designing and implementing dimensional data models. It valuable resource for anyone who wants to learn more about the theoretical foundations of ETL.
Provides a comprehensive overview of data management, including ETL. It covers a wide range of topics, from data governance and data quality to data integration and data warehousing. It valuable resource for anyone who wants to learn more about the broader context of ETL.
Offers a comprehensive overview of the data engineering lifecycle, which includes ETL as a core component. It covers planning, building, and managing data systems, providing a strong foundation for understanding the role of ETL in a modern data stack. It is suitable for those new to data engineering as well as those looking to solidify their understanding of best practices.
Provides a top-down approach to building data warehouses. It covers a wide range of topics, from data modeling and data integration to data warehousing architecture and management. It valuable resource for anyone who wants to learn more about the overall process of building data warehouses, including ETL.
Provides a comprehensive overview of data integration and ETL for data warehousing. It covers a wide range of topics, from data modeling and data extraction to data transformation and data loading. It valuable resource for anyone who wants to learn more about the overall process of data integration and ETL for data warehousing.
Effective ETL relies heavily on good data governance and master data management. provides a thorough understanding of these crucial concepts. It's valuable for professionals who need to understand the broader data landscape and how to ensure data quality and consistency within their ETL processes.
This foundational text for data warehousing and dimensional modeling, which are highly relevant to ETL. It provides comprehensive guidance on designing dimensional databases that are easy to understand and provide fast query response. While not solely focused on ETL, it offers essential context and principles for anyone involved in the 'Load' phase and overall data warehouse design. classic and widely used reference in the field.
While not exclusively about ETL, this book provides a deep dive into the fundamental concepts of data systems, including batch and stream processing, which are integral to modern ETL pipelines. It helps in understanding the trade-offs and design choices behind various data processing technologies. is essential for gaining a broader understanding of the landscape in which ETL operates and is highly recommended for architects and senior engineers.
Provides a theoretical and practical overview of data integration. It covers a wide range of topics, from data modeling and data cleansing to data warehousing and data mining. It valuable resource for anyone who wants to learn more about the foundations of ETL.
Focuses on implementing data engineering concepts, including ETL, using Python. It's a practical guide for those who want to build data pipelines with a popular programming language. It covers extracting, transforming, and loading data using Python libraries and tools. This book is valuable for hands-on learners and professionals using Python for ETL tasks.
Given the increasing importance of real-time data processing, understanding Kafka is beneficial for contemporary ETL. provides a comprehensive guide to Kafka, which is often used in modern data pipelines for streaming ETL. It covers Kafka's architecture, APIs, and best practices for building scalable and reliable data streams.
Apache Spark powerful engine for big data processing, commonly used in ETL workflows for large datasets. This book, written by one of Spark's creators, offers a comprehensive guide to using Spark for various data processing tasks. It's particularly relevant for those dealing with big data ETL challenges.
While this book focuses on data warehousing fundamentals, it provides a strong foundation for understanding the ETL process. It covers topics such as data modeling, data integration, and data quality. It valuable resource for anyone who wants to learn more about the foundations of ETL.
Data modeling foundational skill for ETL, as the target schema significantly impacts the transformation and loading processes. offers a clear and simple approach to data modeling, making it accessible for beginners. It helps in understanding how data should be structured for effective data warehousing and analytics.
Airflow popular platform for orchestrating complex data pipelines, including ETL workflows. provides a practical guide to using Airflow for building, scheduling, and monitoring data pipelines. It's highly relevant for data engineers and developers managing ETL processes in a production environment.
W.H. Inmon prominent figure in data warehousing, and this book focuses on data integration, a broader concept that encompasses ETL. It provides valuable insights into architecting data-driven systems and the importance of data integration. While theoretical in parts, it offers a strong conceptual foundation.
Is specifically dedicated to the ETL process within a data warehousing context. It provides practical techniques and best practices for extracting, cleaning, transforming, and loading data. It's an excellent resource for understanding the intricacies of building robust ETL systems and is considered a key text for ETL developers and architects. This book is highly valuable as a reference tool for practitioners.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Similar courses are unavailable at this time. Please try again later.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser