We may earn an affiliate commission when you visit our partners.
Course image
Udemy logo

Mastering AWS Glue, QuickSight, Athena & Redshift Spectrum

Siddharth Mehta

PS:

Read more

PS:

  1. Please do NOT join the course if you do NOT have any basic working knowledge of AWS Console and AWS Services like AWS Beginners may struggle understanding some of the topics.

  2. Course explains all the labs. If you want to practice labs, it would require AWS Account and may cost $$.

  3. Basic working knowledge of Redshift is recommended, but not a must.

  4. This course has been designed for intermediate and expert AWS Developers / Architects / Administrators.

  5. Course covers each and every feature that AWS has released since 2018 for AWS Glue, AWS QuickSight, AWS Athena, and Amazon Redshift Spectrum, and it regularly updated with every new feature released for these services.

Serverless is the future of cloud computing and AWS is continuously launching new services on Serverless paradigm. AWS launched Athena and QuickSight in Nov 2016, Redshift Spectrum in Apr 2017, and Glue in Aug 2017. Data and Analytics on AWS platform is evolving and gradually transforming to serverless mode.

Businesses have always wanted to manage less infrastructure and more solutions. Big data challenges are continuously challenging the infrastructure boundaries. Having Serverless Storage, Serverless ETL, Serverless Analytics, and Serverless Reporting, all on one cloud platform had sounded too good to be true for a very long time. But now its a reality on AWS platform. AWS is the only cloud provider that has all the native serverless components for a true Serverless Data Lake Analytics solution.

It's not a secret that when a technology is new in the industry, professionals with expertise in new technologies command great salaries. Serverless is the future, Serverless is the industry demand, and Serverless is new. It's the perfect time and opportunity to jump into Serverless Analytics on AWS Platform.

In this course, we would learn the following:

1) We will start with Basics on Serverless Computing and Basics of Data Lake Architecture on AWS.

2) We will learn Schema Discovery, ETL, Scheduling, and Tools integration using Serverless AWS Glue Engine built on Spark environment.

3) We will learn to develop a centralized Data Catalogue too using Serverless AWS Glue Engine.

4) We will learn to query data lake using Serverless Athena Engine build on the top of Presto and Hive.

5) We will learn to bridge the data warehouse and data lake using Serverless Amazon Redshift Spectrum Engine built on the top of Amazon Redshift platform.

6) We will learn to develop reports and dashboards, with a powerpoint like slideshow feature, and mobile support, without building any report server, by using Serverless Amazon QuickSight Reporting Engines.

7) We will finally learn how to source data from data warehouse, data lake, join data, apply row security, drill-down, drill-through and other data functions using the Serverless Amazon QuickSight Reporting Engines.

This course understands your time is important, and so the course is designed to be laser-sharp on lecture timings, where all the trivial details are kept at a minimum and focus is kept on core content for experienced AWS Developers / Architects / Administrators. By the end of this course, you can feel assured and confident that you are future-proof for the next change and disruption sweeping the cloud industry.

I am very passionate about AWS Serverless computing on Data and Analytics platform, and am covering A-to-Z of all the topics discussed in this course.

So if you are excited and ready to get trained on AWS Serverless Analytics platform, I am ready to welcome you in my class .

Enroll now

What's inside

Learning objectives

  • Confidently work with aws serverless services to develop data catalogue, etl, analytics and reporting on a data lake
  • Develop deep knowledge in glue, athena, redshift spectrum and quicksight
  • Build a serverless data lake on aws using structured and unstructured data
  • Architect serverless analytics solutions on aws cloud platform

Syllabus

Introduction

Instructor and Course Introduction

Pre-requisites - What you'll need for this course

Course Objectives

Read more

Course Content, Convention and Resources

AWS Serverless Analytics and Data Lake Basics

Section Agenda

Learn about basics of Serverless Computing and which AWS Services fits into it

Learn basics of AWS Serverless Data Lake Architecture

Amazon S3 - Test-Data Setup

Setup sample data on S3 buckets that would be used throughout this course

Configure S3 Storage Analytics

Amazon Redshift - Cluster and Sample Data Setup

Introduction to Amazon Redshift

Develop Amazon Redshift Cluster

Install and setup SQL Client to work with Amazon Redshift

Load sample data in Redshift cluster

AWS Glue - Architecture and Setup

Learn AWS Glue Architecture with diagrams

Learn frequently used AWS Glue Terms and their meanings

Learn about different applications and features of AWS Glue

Learn internal architecture of AWS Glue

Learn about the cost economics of AWS Glue

Setup IAM Role and policies to use with AWS Glue

Learn about the networking concepts and settings required for AWS Glue

Configure network settings for AWS Glue

AWS Glue - Database Objects

Learn about the concept of Data Catalog in AWS Glue

Learn to develop databases in AWS Glue

Learn to develop tables in AWS Glue

Develop tables manually in AWS Glue

AWS Glue - Crawlers

Learn about the concept of Crawler in AWS Glue

Learn about the concept of classifiers in AWS Glue

Develop crawlers in AWS Glue - Lab 1

Develop crawlers in AWS Glue - Lab 2

Develop crawlers in AWS Glue - Lab 3

Develop crawlers in AWS Glue - Lab 4

Develop crawlers in AWS Glue - Lab 5

Develop crawlers in AWS Glue - Lab 6

Develop crawlers in AWS Glue - Lab 7

AWS Glue - ETL Jobs

Learn to develop serverless ETL jobs with AWS Glue

Learn about different ETL job properties in AWS Glue

Learn to develop serverless ETL jobs with AWS Glue with Redshift as data source

Learn to develop Python scripts and properties for serverless ETL jobs using AWS Glue

Learn about built-in ETL Transformations in AWS Glue

AWS Glue - Triggers

Learn about Triggers in AWS Glue

AWS Glue - Dev Ops Setup

Learn about AWS Glue Development Endpoints

Learn to install and setup Apache Zeppelin

Learn to install Git and setup Port Forwarding

Learn to integrate AWS Glue Development Endpoint with Apache Zeppelin Notebook

Learn monitoring options available for AWS Glue

Learn latest new features released by AWS Glue from year 2018 till date

AWS Glue supports timeout values for ETL Jobs

AWS Glue supports reading from Amazon DynamoDB Tables

AWS Glue provides additional ETL Job metrics

AWS Glue supports data encryption at rest

AWS Glue supports connecting Sagemaker notebooks to dev endpoints

AWS Glue supports resource based policies and permissions

AWS Glue introduces Python Shell Jobs which can be used for custom transformations and other generic tasks in ETL jobs

Download Source code AWS Glue Data Catalog Client - Hive Metastore

14-Mar-2019 : AWS Glue enables running Apache Spark SQL Queries

AWS Glue enables running Apache Spark SQL Queries

AWS Glue supports additional options for memory-intensive jobs

AWS Glue crawlers support existing Data Catalog tables as sources

AWS Glue enables continuous logging for Spark ETL Jobs

AWS Glue supports scripts compatible with Python 3.6 in Shell Jobs

AWS Glue provides workflows to orchestrate ETL workloads

AWS Glue supports running ETL Jobs on Spark 2.4.3 with Python 3

AWS Glue supports additional options for memory intensive jobs

AWS Glue supports bookmarking Parquet and ORC Files using ETL Jobs

Launch AWS Glue, EMR and Aurora Serverless Clusters in Shared VPCs

AWS Glue provides FindMatches ML Transform

AWS Glue releases binaries of Glue ETL libraries for Glue Jobs

AWS Glue provides Apache Spark UI to monitor Glue ETL Jobs

AWS Glue provides ability to rewind Spark ETL Job bookmarks

AWS Glue support FindMatches ML Transform on Spark 2.4.3 & Glue 1.0

AWS Glue supports bringing your own JDBC driver for Spark ETL Jobs

Glue adds new transforms - Purge, Transition and Merge

Glue supports reading & writing to DocumentDB & MongoDB Collection

AWS Glue supports new tables, update schema & partitions from Jobs

AWS Glue supports serverless streaming ETL

AWS Athena - Architecture and Setup

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Delves into concepts essential to careers in data analytics: data catalog, ETL, analytics, and reporting using data lakes
Meets industry demand for serverless data analytics professionals through practical training
Focuses on intermediate to advanced topics, catering to experienced AWS Developers, Architects, and Administrators
Provides hands-on labs to reinforce understanding of AWS Glue, Athena, Redshift Spectrum, and QuickSight
Offers updated content on new features released by Amazon since 2018, ensuring currency with industry advancements
Requires prior experience with AWS Console and services, potentially posing a challenge for beginners

Save this course

Save Mastering AWS Glue, QuickSight, Athena & Redshift Spectrum to your list so you can find it easily later:
Save

Reviews summary

Course with video and presenter quality issues

According to students, this course has some issues with the quality of the downloaded videos and the presenter's delivery. Learners say that the video quality is low at 360p and that the presenter is merely reading from AWS-extracted talking points.
Presenter is merely reading from AWS talking points
"Presenter is just reading from points extracted from AWS"
Low video quality at 360p
"Quality of downloaded video is 360p which is low quality"

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Mastering AWS Glue, QuickSight, Athena & Redshift Spectrum with these activities:
Compile and organize the materials from the course
Organize the materials from the course to make them easier to review later
Show steps
  • Create a folder for the course materials
  • Download or save all of the course materials
  • Organize the materials into subfolders (e.g., by topic)
Follow tutorial to build a simple data pipeline
Build a simple data pipeline to understand the basic concept of AWS Glue
Browse courses on Data Pipeline
Show steps
  • Find a tutorial on building a data pipeline using AWS Glue
  • Follow the tutorial step-by-step
  • Test the data pipeline
Practice writing ETL jobs using AWS Glue
Practice writing ETL jobs to solidify your understanding of AWS Glue
Browse courses on ETL
Show steps
  • Create a new AWS Glue job
  • Write the ETL code using Python or Scala
  • Test the ETL job
One other activity
Expand to see all activities and additional details
Show all four activities
Attend an AWS Meetup or conference
Attend an AWS Meetup or conference to connect with other AWS professionals and learn about the latest AWS technologies
Show steps
  • Find an AWS Meetup or conference in your area
  • Register for the event
  • Attend the event

Career center

Learners who complete Mastering AWS Glue, QuickSight, Athena & Redshift Spectrum will develop knowledge and skills that may be useful to these careers:
Data Analyst
Data Analysts are responsible for collecting, cleaning, and analyzing data to provide insights and recommendations to organizations. This course will teach you how to use AWS Glue, Athena, Redshift Spectrum, and QuickSight to build a data lake, analyze data, and create visualizations. These skills are essential for Data Analysts who work with large datasets and need to be able to quickly and efficiently extract meaningful insights from data.
Big Data Architect
Big Data Architects are responsible for designing and developing big data solutions. This course can be helpful for Big Data Architects who are looking to learn about the latest cloud-based data technologies. You will learn how to build a serverless data lake, analyze data with SQL and Python, and create visualizations in QuickSight. This knowledge can help you develop and manage big data solutions that are scalable, reliable, and secure.
Data Infrastructure Engineer
Data Infrastructure Engineers are responsible for designing and developing data infrastructure solutions. This course can be helpful for Data Infrastructure Engineers who are looking to learn about the latest cloud-based data technologies. You will learn how to build a serverless data lake, analyze data with SQL and Python, and create visualizations in QuickSight. This knowledge can help you design and develop data infrastructure solutions that are scalable, reliable, and secure.
Data Architect
Data Architects not only bridge the gap between business and data, but they also ensure that the collection, storage, and usage of data adhere to both IT best practices and the organization's regulations. This course will help you gain a comprehensive understanding of data lake architectures and the components involved in a serverless data lake solution. This knowledge can be used to understand and analyze different data requirements, design and maintain data pipelines, and develop data management solutions to support business needs. Additionally, you will learn how to manage data governance and security in a cloud-based environment.
Data Warehouse Architect
Data Warehouse Architects are responsible for designing, developing, and maintaining data warehouses. This course can be helpful for Data Warehouse Architects who are looking to learn about the latest cloud-based data technologies. You will learn how to build a serverless data lake, analyze data with SQL and Python, and create visualizations in QuickSight. This knowledge can help you to design and manage data warehouses that are scalable, reliable, and secure.
Data Integration Engineer
Data Integration Engineers are responsible for designing and developing data integration solutions. This course can be helpful for Data Integration Engineers who are looking to learn about the latest cloud-based data technologies. You will learn how to build a serverless data lake, analyze data with SQL and Python, and create visualizations in QuickSight. This knowledge can help you to design and develop data integration solutions that are scalable, reliable, and secure.
Data Quality Analyst
Data Quality Analysts are responsible for ensuring the quality of data. This course can be helpful for Data Quality Analysts who are looking to learn about the latest cloud-based data technologies. You will learn how to build a serverless data lake, analyze data with SQL and Python, and create visualizations in QuickSight. This knowledge can help you develop and implement data quality processes that are effective and efficient.
Database Developer
Database Developers are responsible for designing, developing, and maintaining databases. This course can be helpful for Database Developers who are looking to learn about the latest cloud-based data technologies. You will learn how to build a serverless data lake, analyze data with SQL and Python, and create visualizations in QuickSight. This knowledge can help you to develop and manage databases that are scalable, reliable, and secure.
Data Governance Analyst
Data Governance Analysts are responsible for developing and implementing data governance policies and procedures. This course can be helpful for Data Governance Analysts who are looking to learn about the latest cloud-based data technologies. You will learn how to build a serverless data lake, analyze data with SQL and Python, and create visualizations in QuickSight. This knowledge can help you develop and implement data governance policies and procedures that are effective and efficient.
Data Engineer
A Data Engineer is responsible for the design, construction, and maintenance of big data infrastructures. The field of Data Engineering is closely related to Data Science and Machine Learning. This course introduces students to the tools and techniques employed in this field, which can also help you perform some of the most common tasks required in this role. For example, this course will teach you to extract, transform, and load data from a data source and bring it into a data warehouse. You will also learn how to write SQL queries to analyze your data. This course covers a variety of topics that go far beyond the scope of what a Data Engineer typically does, but it can help you gain a solid foundation with essential skills.
Cloud Engineer
Cloud Engineers are responsible for designing, deploying, and managing cloud-based applications and infrastructure. This course will teach you how to use AWS Glue, Athena, Redshift Spectrum, and QuickSight to build a serverless data lake, analyze data, and create visualizations. These skills can be valuable for Cloud Engineers who are working on projects that involve data analysis and visualization.
Software Engineer
Software Engineers are responsible for designing, developing, and maintaining software applications. This course will introduce you to the fundamentals of cloud-based data technologies and how they can be used to build serverless applications. You will learn how to use AWS Glue, Athena, Redshift Spectrum, and QuickSight to build a data lake, analyze data, and create visualizations. These skills can be valuable for Software Engineers who are working on projects that involve data analysis and visualization.
Data Scientist
Data Scientists use scientific methods, processes, algorithms, and systems to extract knowledge and insights from data in various forms, both structured and unstructured. This course will introduce you to the fundamentals of data science by teaching you how to build a data lake, analyze data with SQL and Python, and create visualizations to communicate your findings. Completing this course can help you get started in this field and prepare for more advanced topics in data science, such as machine learning and artificial intelligence.
Business Analyst
A Business Analyst analyzes an organization or business domain to understand its needs and goals. This course will help you gain a better understanding of how data can be used to improve business outcomes. You will learn how to collect, analyze, and interpret data to identify trends and make recommendations to improve decision-making. This course can help you develop the skills and knowledge necessary to be successful as a Business Analyst, particularly in organizations that leverage data to make informed decisions.
Machine Learning Engineer
Machine Learning Engineers are responsible for designing and developing machine learning models. This course may be useful for Machine Learning Engineers who are looking to gain experience with cloud-based data technologies. You will learn how to build a serverless data lake, analyze data with SQL and Python, and create visualizations in QuickSight. This knowledge can help you develop and deploy machine learning models that are scalable, reliable, and accurate.

Reading list

We've selected eight books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Mastering AWS Glue, QuickSight, Athena & Redshift Spectrum.
Provides a modern approach to data warehousing, covering everything from the basics of data warehousing to advanced topics like data lake integration and real-time analytics.
Comprehensive guide to machine learning for data science, covering everything from the basics of machine learning to advanced topics like deep learning and natural language processing.
Classic reference on deep learning, covering everything from the basics of deep learning to advanced topics like convolutional neural networks and recurrent neural networks.
Classic reference on natural language processing, covering everything from the basics of natural language processing to advanced topics like machine translation and text summarization.
Classic reference on Hadoop, covering everything from the basics of the Hadoop Distributed File System (HDFS) to advanced topics like MapReduce and YARN.
Is the definitive guide to Apache Spark, covering everything from the basics of Spark SQL to advanced topics like machine learning and graph processing.
Provides a beginner-friendly introduction to data lakes. It covers all the essential concepts and components, making it a great starting point for anyone new to data lakes.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Mastering AWS Glue, QuickSight, Athena & Redshift Spectrum.
Data Engineering using AWS Data Analytics
Most relevant
Analyzing Data on AWS
Most relevant
Serverless Analytics on AWS
Most relevant
AWS: Data Analysis and Visualization
Most relevant
AWS Data Architect Bootcamp - 43 Services 500 FAQs 20+...
Most relevant
Amazon QuickSight Deep Dive
Most relevant
AWS Certified Data Engineer Associate 2024 - Hands On!
Most relevant
Getting Started with Data Analytics on AWS
Most relevant
Data Lake Mastery: The Key to Big Data & Data Engineering
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser