We may earn an affiliate commission when you visit our partners.

AWS Glue - The Complete Masterclass

Data Soup

4.3

Based on 2,026 ratings

, see reviews

Data Soup

Learn the latest in AWS Glue - And learn to use it with other AWS resources.

In this growing world of data and growing cloud computing, it is necessary to have the core competency in cloud ETL tool also. AWS Glue come with the in built Spark support, Data Quality and data curation using Data brew. The top technology, finance and insurance companies like JPMC, Vanguard, BCBS, Amazon, Capital One, Capgemini, FINRA and more are all using AWS Glue to run their ETL on PetaBytes scale of data everyday.

Learn the latest in AWS Glue - And learn to use it with other AWS resources.

AWS Glue provides server less and scalable ETL solution where scripts can be written in Python, Spark and currently using Ray. It also provides the visual drag and drop options to create the ETL pipelines. As now more and more companies are migrating to cloud it has caused an explosion in demand for this skill. With the mastery of AWS Glue, you now have the ability to quickly become one of the most knowledgeable people in the job market.

This course will teach the basics in AWS Glue Data Catalog, AWS Glue Studio, AWS resources such as Once we've done that we'll go through how to use the Glue Data Quality, Glue Streaming and Glue Data Brew ETL pipelines. All along the way you'll have multiple labs to create all the resources and ETL pipelines using AWS console and CloudFormation templates that you put you right into a real world situation where you need to use your new skills to solve a real problem.

If you're ready to jump into the data engineering world of AWS Glue, this is the course for you.

Enroll now

Or start a personal plan

And upskill with Udemy

What's inside

Learning objectives

Understanding of aws glue data catalog and creating aws glue database, glue tables and crawlers
Using aws glue studio, creating the etl pipeline along with scheduled triggers, conditional triggers and glue workflow
Kms, iam role, sns, s3 and other associated aws resources associated with glue. understanding and creation of all the resources
Understanding of aws glue data quality and creating the associated glue etl pipeline

Understanding aws glue data brew , creating the recipe, project and job to curate the dataset
Understanding the aws glue streaming, creating the stream using the python shell job and load the stream using the spark streaming
Different ways aws glue job can fail and debugging the failure and fix
Creating the aws resources for aws glue pipeline using the aws console and cloudformation

Understanding of aws glue data catalog and creating aws glue database, glue tables and crawlers
Using aws glue studio, creating the etl pipeline along with scheduled triggers, conditional triggers and glue workflow
Kms, iam role, sns, s3 and other associated aws resources associated with glue. understanding and creation of all the resources
Understanding of aws glue data quality and creating the associated glue etl pipeline
Understanding aws glue data brew , creating the recipe, project and job to curate the dataset
Understanding the aws glue streaming, creating the stream using the python shell job and load the stream using the spark streaming
Different ways aws glue job can fail and debugging the failure and fix
Creating the aws resources for aws glue pipeline using the aws console and cloudformation

Syllabus

Introduction

Course Overview

Glue Pipeline Resources (Section 2,3 and 5) Overview

Glue Resources Setup Part 1 - IAM, KMS, SNS

We will make sure that required buckets ,files, CloudFormation templates and IAM Roles are in place before we create our first Glue Job.

Traffic lights

Read about what's good

what should give you pause

and possible dealbreakers

Provides hands-on experience with AWS Glue Data Catalog, Studio, Data Quality, and Data Brew, which are essential components for modern data engineering workflows

Explores integration with other AWS resources like KMS, IAM Role, SNS, and S3, which is crucial for building secure and scalable ETL pipelines

Covers debugging techniques for common AWS Glue job failures, which is invaluable for maintaining reliable data pipelines in production environments

Uses CloudFormation templates to create AWS resources, which is a best practice for infrastructure as code and ensures consistency across environments

Requires familiarity with AWS services like S3, IAM, and CloudFormation, which may necessitate additional learning for those new to the AWS ecosystem

Focuses on using AWS Glue with Python and Spark, so learners without prior experience in these technologies may face a steeper learning curve

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.

Save

Reviews summary

Comprehensive aws glue hands-on masterclass

According to learners, this course provides a comprehensive and practical introduction to AWS Glue, covering key services and concepts essential for data engineering on AWS. Students particularly praise the numerous hands-on labs and practical demonstrations, finding them crucial for solidifying understanding and applying concepts effectively. The course is noted for its clear explanations and coverage of related AWS resources like IAM, S3, and CloudFormation, which helps integrate Glue into a broader AWS workflow. While some encountered minor challenges with environment setup, the overall consensus highlights the course's effectiveness in equipping learners with the skills needed to work with AWS Glue.

Instructor makes complex topics understandable.

"The instructor did a great job explaining complex topics in a clear and easy-to-understand manner."

"I found the explanations to be very clear and well-structured."

"Learning was easy because the concepts were broken down effectively."

Learn skills applicable to real-world jobs.

"This course focuses on practical skills that I can use immediately in my job."

"It felt like I was solving real-world data engineering problems."

"I learned how to apply AWS Glue effectively for typical use cases."

Includes Glue and related AWS services.

"It covers not just Glue but also relevant AWS services like S3, IAM, and CloudFormation which is very helpful."

"Appreciated learning how Glue integrates with other parts of AWS."

"The curriculum provides a good overview of the AWS ecosystem around Glue."

Practical exercises are key to learning.

"The course provides great hands-on labs that helped me understand the concepts by doing them."

"I really appreciated the labs, they made everything click and feel much more practical."

"Doing the exercises solidified my learning more than just watching lectures."

Initial setup can be challenging for some.

"I had some trouble setting up the environment correctly to run the labs."

"Getting the initial setup right required a bit of troubleshooting."

"Some steps in the environment setup could be more detailed or simplified."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in AWS Glue - The Complete Masterclass with these activities:

Review AWS Fundamentals

Show steps

Reviewing AWS fundamentals will provide a solid foundation for understanding AWS Glue and its integration with other AWS services.

Show steps

Review the core AWS services like S3, IAM, and CloudWatch.
Understand basic cloud computing concepts.
Familiarize yourself with the AWS Management Console.

Brush up on Python and Spark

Show steps

Practicing Python and Spark will enable you to write and debug AWS Glue ETL scripts more effectively.

Browse courses on Pyspark

Show steps

Practice writing basic Python scripts.
Learn the fundamentals of Apache Spark.
Explore PySpark for data manipulation.

Build a Simple ETL Pipeline with AWS Glue

Show steps

Building a simple ETL pipeline will allow you to apply the concepts learned in the course and gain hands-on experience with AWS Glue.

Show steps

Set up an AWS account and configure the AWS CLI.
Create an S3 bucket to store your data.
Design and implement a basic ETL pipeline using AWS Glue Studio.
Monitor and debug your pipeline using AWS CloudWatch.

Four other activities

Expand to see all activities and additional details

Show all seven activities

Follow AWS Glue Tutorials

Show steps

Following guided tutorials will provide step-by-step instructions and practical examples for using AWS Glue features.

Show steps

Find official AWS Glue tutorials on the AWS website.
Work through tutorials on creating crawlers and ETL jobs.
Experiment with different data sources and transformations.

Document Your AWS Glue Projects

Show steps

Documenting your projects will reinforce your understanding of AWS Glue and help you share your knowledge with others.

Show steps

Create a README file for each AWS Glue project.
Describe the purpose, architecture, and implementation details of your project.
Include instructions on how to set up and run your project.

Contribute to AWS Glue Open Source Projects

Show steps

Contributing to open source projects will allow you to collaborate with other developers and improve your AWS Glue skills.

Show steps

Find AWS Glue-related open source projects on GitHub.
Identify issues or features that you can contribute to.
Submit pull requests with your code changes.

Optimize an AWS Glue ETL Job for Performance

Show steps

Optimizing an ETL job will provide practical experience in improving the efficiency and scalability of AWS Glue pipelines.

Show steps

Identify performance bottlenecks in an existing AWS Glue ETL job.
Experiment with different optimization techniques, such as partitioning and data filtering.
Measure the performance improvements and document your findings.

Career center

Learners who complete AWS Glue - The Complete Masterclass will develop knowledge and skills that may be useful to these careers:

ETL Developer

An ETL Developer specializes in designing, building, and maintaining ETL pipelines which extract, transform, and load data into data warehouses or data lakes. This course directly aligns with the work of an ETL developer; it provides hands on exercises with AWS Glue, a serverless ETL service. The course covers creation of ETL pipelines using AWS Glue Studio and how to manage data quality. It also covers use cases for Glue streaming. Anyone seeking to become an ETL developer will find this course useful due to its focus on cloud based ETL processes as well as debugging and failure resolution.

See salaries and explore the career path for ETL Developer

Data Engineer

A data engineer designs, builds, and maintains the infrastructure that allows for the collection, storage, processing, and analysis of large datasets. This course is ideal for an aspiring data engineer. The course provides practical experience with AWS Glue, a serverless ETL service which is commonly used by data engineers to build data pipelines and data lakes. The course covers key aspects of data engineering such as setting up data catalogs and crawlers. In addition, it covers data quality and data curation, which are critical for any successful data engineering project. This course provides hands-on experience using AWS Glue and related resources, which helps build a solid foundation for data engineering.

See salaries and explore the career path for Data Engineer

Data Integration Specialist

A data integration specialist is responsible for moving data between different systems. This course is directly applicable to the role of a data integration specialist. It gives practical experience with AWS Glue, a tool designed for ETL processes. This course also covers the AWS Glue Data Catalog, AWS Glue Studio, and how to create and manage ETL pipelines, which are all essential aspects of data integration. The course further covers how to integrate AWS Glue with other AWS resources. A data integration specialist will find this course useful.

See salaries and explore the career path for Data Integration Specialist

Cloud Data Architect

A cloud data architect designs and oversees the implementation of data management systems in cloud environments. Aspiring cloud data architects may find this course beneficial, as it provides detailed exposure to AWS Glue, a crucial service for building data pipelines on the cloud. The course covers key topics including AWS Glue Data Catalog, AWS Glue Studio, and data quality. It also teaches how to integrate AWS Glue with other AWS resources, which is essential for building complex cloud-based data solutions. Familiarity with data quality, data curation and different ways AWS Glue jobs can fail helps a cloud data architect create robust and reliable data systems.

See salaries and explore the career path for Cloud Data Architect

Analytics Engineer

An analytics engineer focuses on transforming raw data into usable formats for analysis and reporting. This course will be valuable for any analytics engineer. It provides a thorough overview of AWS Glue, a tool used for data transformation and pipeline creation. The course covers important skills such as data cataloging, setting up ETL pipelines within AWS Glue Studio, data quality checks, and data curation. An analytics engineer can use this knowledge to help produce reliable data sources for their organization. The practical skills provided in this course are key for the work of an analytics engineer.

See salaries and explore the career path for Analytics Engineer

Cloud Solutions Engineer

A cloud solutions engineer designs and implements cloud based solutions. This course will help those who seek to work as cloud solutions engineers by providing them with a solid grasp of AWS Glue, a core data integration service in AWS. The course will give you knowledge of data catalog, ETL pipeline creation, data quality management, and integration of Glue with other AWS resources. It allows a cloud solutions engineer to design comprehensive and effective data workflows within AWS. This course will help a cloud solutions engineer broaden the data processing aspect of their role.

See salaries and explore the career path for Cloud Solutions Engineer

Cloud Engineer

A cloud engineer is responsible for the design, implementation, and maintenance of cloud computing infrastructure. This course may be useful for those looking to become a cloud engineer. It provides practical experience with AWS Glue, which is often used in cloud environments for data processing. The hands-on labs on creating AWS resources, setting up IAM roles, and working with CloudFormation templates also contribute to essential cloud engineering skills. The course provides a foundation in how data is managed and moved within AWS, which is a frequent concern for cloud engineers.

See salaries and explore the career path for Cloud Engineer

Solutions Architect

A solutions architect designs and plans complex IT solutions that meet business needs. This course may be useful for a solutions architect looking to enhance their skills. It covers using AWS Glue to build data pipelines, which is often an important part of data intensive solutions. It also covers using other AWS resources with Glue. The course also provides familiarity with data quality and data curation, which are important aspects of many solutions. A solutions architect needs to be able to recommend solutions that integrate well with data systems, and understanding Glue helps in that goal.

See salaries and explore the career path for Solutions Architect

Machine Learning Engineer

A machine learning engineer develops and deploys machine learning models. Aspiring machine learning engineers may find this course useful. It teaches how to use AWS Glue to prepare data for machine learning applications. It gives hands-on experience using AWS Glue Studio and data quality functions. Understanding these concepts is helpful in creating reliable datasets for building and training machine learning models. This course may be useful to a machine learning engineer seeking to expand their data engineering skills.

See salaries and explore the career path for Machine Learning Engineer

Data Analyst

A data analyst uses data to identify trends, provide insights, and support decision-making. This course may be helpful to a data analyst looking to learn more about how data is prepared, which happens before analysis. It provides an overview of how data is extracted, transformed and loaded and how it is managed using AWS Glue. The course also teaches AWS Glue Data Brew for data curation, and AWS Glue Data Quality features, both of which help a data analyst work with accurate and reliable data. The course provides direct experience with the processes that a data analyst should be familiar with to be successful.

See salaries and explore the career path for Data Analyst

Business Intelligence Developer

A business intelligence developer designs and creates systems and reports that help businesses make better decisions. This course may interest a business intelligence developer. It provides a foundation in how data is extracted, transformed, and loaded, particularly through AWS Glue, which is critical in building business intelligence solutions. The course also covers data quality and data curation using Data Brew, ensuring the accuracy of data used for reporting. A business intelligence developer can use the data pipelines created by Glue to populate their systems.

See salaries and explore the career path for Business Intelligence Developer

Database Administrator

A database administrator manages the performance, integrity, and security of databases. This course may be relevant for database administrators. It covers how AWS Glue manages data catalogs and crawlers to discover and organize data, which is a key function for database administration. The course also covers setting up database connections with the AWS Glue Data Catalog, which is helpful for a database administrator to understand. Though a database administrator may not use Glue directly, knowledge of how Glue functions helps them manage data efficiently.

See salaries and explore the career path for Database Administrator

Software Developer

A software developer designs, develops, and maintains software applications. This course may be helpful to a software developer who wants to expand skills in data processing, using AWS Glue. The course covers how to create and manage data pipelines specifically for AWS cloud environments, with hands-on labs that build experience. The course also covers using Python for writing ETL scripts, which helps to build a software developer's ability to create data driven applications. Software developers seeking to incorporate data management into their work may find this course useful.

See salaries and explore the career path for Software Developer

System Administrator

A system administrator manages and maintains computer systems including cloud based systems. This course may be useful for a system administrator who wants to build skill with cloud data services. This course covers how AWS Glue is configured and maintained and gives a solid understanding of data pipelines, ETL processes, and the related AWS resources. It also provides information about how AWS Glue integrates with other AWS services, which is important for good cloud based system administration. The course helps a system administrator better maintain cloud based data systems.

See salaries and explore the career path for System Administrator

Technical Project Manager

A technical project manager oversees the planning, execution, and delivery of technical projects. This course may be useful for technical project managers working in data intensive projects. It provides an overview of AWS Glue and how it is used to set up data integration and ETL processes. This course introduces AWS Glue components including Data Catalog, AWS Glue Studio, Data Quality, and Data Brew. This background helps a technical project manager understand the work that data engineers and ETL developers do, which aids in better project planning and management.

See salaries and explore the career path for Technical Project Manager