We may earn an affiliate commission when you visit our partners.
Course image
Durga Viswanatha Raju Gadiraju, Naga Bhuwaneshwar, and Kavitha Penmetsa

Data Engineering is all about building Data Pipelines to get data from multiple sources into Data Lakes or Data Warehouses and then from Data Lakes or Data Warehouses to downstream systems. As part of this course, I will walk you through how to build Data Engineering Pipelines using AWS Data Analytics Stack. It includes services such as Glue, Elastic Map Reduce (EMR), Lambda Functions, Athena, EMR, Kinesis, and many more.

Read more

Data Engineering is all about building Data Pipelines to get data from multiple sources into Data Lakes or Data Warehouses and then from Data Lakes or Data Warehouses to downstream systems. As part of this course, I will walk you through how to build Data Engineering Pipelines using AWS Data Analytics Stack. It includes services such as Glue, Elastic Map Reduce (EMR), Lambda Functions, Athena, EMR, Kinesis, and many more.

Here are the high-level steps which you will follow as part of the course.

  • Setup Development Environment

  • Getting Started with AWS

  • Storage - All about AWS s3 (Simple Storage Service)

  • User Level Security - Managing Users, Roles, and Policies using IAM

  • Infrastructure - AWS EC2 (Elastic Cloud Compute)

  • Data Ingestion using AWS Lambda Functions

  • Overview of AWS Glue Components

  • Setup Spark History Server for AWS Glue Jobs

  • Deep Dive into AWS Glue Catalog

  • Exploring AWS Glue Job APIs

  • AWS Glue Job Bookmarks

  • Development Life Cycle of Pyspark

  • Getting Started with AWS EMR

  • Deploying Spark Applications using AWS EMR

  • Streaming Pipeline using AWS Kinesis

  • Consuming Data from AWS s3 using boto3 ingested using AWS Kinesis

  • Populating GitHub Data to AWS Dynamodb

  • Overview of Amazon AWS Athena

  • Amazon AWS Athena using AWS CLI

  • Amazon AWS Athena using Python boto3

  • Getting Started with Amazon AWS Redshift

  • Copy Data from AWS s3 into AWS Redshift Tables

  • Develop Applications using AWS Redshift Cluster

  • AWS Redshift Tables with Distkeys and Sortkeys

  • AWS Redshift Federated Queries and Spectrum

Here are the details about what you will be learning as part of this course. We will cover most of the commonly used services with hands-on practice which are available under AWS Data Analytics.

Getting Started with AWS

As part of this section, you will be going through the details related to getting started with AWS.

  • Introduction - AWS Getting Started

  • Create s3 Bucket

  • Create All IT professionals who would like to work on AWS should be familiar with it. We will get into quite a few common features related to AWS s3 in this section.

    • Getting Started with AWS S3

    • Setup Data Set locally to upload to AWS s3

    • Adding AWS S3 Buckets and Managing Objects (files and folders) in AWS s3 buckets

    • Version Control for AWS S3 Buckets

    • Cross-Region Replication for AWS S3 Buckets

    • Overview of AWS S3 Storage Classes

    • Overview of AWS S3 Glacier

    • Managing AWS S3 using As part of this section, you will understand the details related to AWS IAM users, groups, roles as well as policies.

      • Creating As part of this section, we will go through some of the basics related to

        • Getting Started with AWS EC2

        • Create

          • Getting Started with AWS EC2

          • Understanding In this section, we will understand how we can develop and deploy Lambda functions using Python as a programming language. We will also see how to maintain a bookmark or checkpoint using s3.

            • Hello World using AWS Lambda

            • Setup Project for local development of AWS Lambda Functions

            • Deploy Project to AWS Lambda console

            • Develop download functionality using requests for AWS Lambda Functions

            • Using 3rd party libraries in AWS Lambda Functions

            • Validating AWS s3 access for local development of AWS Lambda Functions

            • Develop upload functionality to s3 using AWS Lambda Functions

            • Validating AWS Lambda Functions using AWS Lambda Console

            • Run AWS Lambda Functions using AWS Lambda Console

            • Validating files incrementally downloaded using AWS Lambda Functions

            • Reading and Writing Bookmark to s3 using AWS Lambda Functions

            • Maintaining Bookmark on s3 using AWS Lambda Functions

            • Review the incremental upload logic developed using AWS Lambda Functions

            • Deploying AWS Lambda Functions

            • Schedule AWS Lambda Functions using AWS Event Bridge

            Overview of AWS Glue Components

            In this section, we will get a broad overview of all important Glue Components such as Glue Crawler, Glue Databases, Glue Tables, etc. We will also understand how to validate Glue tables using AWS Athena. AWS Glue (especially Glue Catalog) is one of the key components in the realm of AWS Data Analytics Services.

            • Introduction - Overview of AWS Glue Components

            • Create AWS Glue Crawler and AWS Glue Catalog Database as well as Table

            • Analyze Data using AWS Athena

            • Creating AWS S3 Bucket and Role to create AWS Glue Catalog Tables using Crawler on the s3 location

            • Create and Run the AWS Glue Job to process data in AWS Glue Catalog Tables

            • Validate using AWS Glue Catalog Table and by running queries using AWS Athena

            • Create and Run AWS Glue Trigger

            • Create AWS Glue Workflow

            • Run AWS Glue Workflow and Validate

            Setup Spark History Server for AWS Glue Jobs

            AWS Glue uses Apache Spark under the hood to process the data. It is important we setup Spark History Server for AWS Glue Jobs to troubleshoot any issues.

            • Introduction - Spark History Server for AWS Glue

            • Setup Spark History Server on AWS

            • Clone AWS Glue Samples repository

            • Build AWS Glue Spark UI Container

            • Update AWS IAM Policy Permissions

            • Start AWS Glue Spark UI Container

            Deep Dive into AWS Glue Catalog

            AWS Glue has several components, but the most important ones are nothing but AWS Glue Crawlers, Databases as well as Catalog Tables. In this section, we will go through some of the most important and commonly used features of the AWS Glue Catalog.

            • Prerequisites for AWS Glue Catalog Tables

            • Steps for Creating AWS Glue Catalog Tables

            • Download Data Set to use to create AWS Glue Catalog Tables

            • Upload data to s3 to crawl using AWS Glue Crawler to create required AWS Glue Catalog Tables

            • Create AWS Glue Catalog Database - itvghlandingdb

            • Create AWS Glue Catalog Table - ghactivity

            • Running Queries using AWS Athena - ghactivity

            • Crawling Multiple Folders using AWS Glue Crawlers

            • Managing AWS Glue Catalog using AWS CLI

            • Managing AWS Glue Catalog using Python Boto3

            Exploring AWS Glue Job APIs

            Once we deploy AWS Glue jobs, we can manage them using AWS Glue Job APIs. In this section we will get overview of AWS Glue Job APIs to run and manage the jobs.

            • Update In this section, we will go through the details related to AWS Glue Job Bookmarks.

              • Introduction to AWS Glue Job Bookmarks

              • Cleaning up the data to run AWS Glue Jobs

              • Overview of AWS Glue CLI and Commands

              • Run AWS Glue Job using AWS Glue Bookmark

              • Validate AWS Glue Bookmark using AWS CLI

              • Add new data to the landing zone to run AWS Glue Jobs using Bookmarks

              • Rerun AWS Glue Job using Bookmark

              • Validate AWS Glue Job Bookmark and Files for Incremental run

              • Recrawl the AWS Glue Catalog Table using We will use this application later while exploring EMR in detail.

                • Setup Virtual Environment and Install Pyspark

                • Getting Started with Pycharm

                • Passing Run Time Arguments

                • Accessing OS Environment Variables

                • Getting Started with Spark

                • Create Function for Spark Session

                • Setup Sample Data

                • Read data from files

                • Process data using Spark APIs

                • Write data to files

                • Validating Writing Data to Files

                • Productionizing the Code

                Getting Started with AWS EMR (Elastic Map Reduce)

                As part of this section, we will understand how to get started with AWS EMR Cluster. We will primarily focus on AWS EMR Web Console. Elastic Map Reduce is one of the key service in AWS Data Analytics Services which provide capability to run applications which process large scale data leveraging distributed computing frameworks such as Spark.

                • Planning for We will be using the Spark Application we deployed earlier.

                  • Deploying Applications using AWS EMR - Introduction

                  • Setup We will use AWS Kinesis Firehose Agent and AWS Kinesis Delivery Stream to read the data from log files and ingest it into AWS s3.

                    • Building Streaming Pipeline using AWS Kinesis Firehose Agent and Delivery Stream

                    • Rotating Logs so that the files are created frequently which will be eventually ingested using AWS Kinesis Firehose Agent and AWS Kinesis Firehose Delivery Stream

                    • Set up AWS Kinesis Firehose Agent to get data from logs into AWS Kinesis Delivery Stream.

                    • Create AWS Kinesis Firehose Delivery Stream

                    • Planning the Pipeline to ingest data into s3 using AWS Kinesis Delivery Stream

                    • Create AWS IAM Group and User for Streaming Pipelines using AWS Kinesis Components

                    • Granting Permissions to

                    • Start and Validate AWS Kinesis Firehose Agent

                    • Conclusion - Building Simple Steaming Pipeline using AWS Kinesis Firehose

                    Consuming Data from AWS s3 using Python boto3 ingested using AWS Kinesis

                    As data is ingested into AWS S3, we will understand how data can ingested in AWS s3 can be processed using boto3.

                    • Customizing AWS s3 folder using AWS Kinesis Delivery Stream

                    • Create AWS IAM Policy to read from AWS s3 Bucket

                    • Validate AWS s3 access using AWS CLI

                    • Setup Python Virtual Environment to explore boto3

                    • Validating access to AWS s3 using Python boto3

                    • Read Content from AWS s3 object

                    • Read multiple AWS s3 Objects

                    • Get the number of AWS s3 Objects using Marker

                    • Get the size of AWS s3 Objects using Marker

                    Populating GitHub Data to AWS Dynamodb

                    As part of this section, we will understand how we can populate data to AWS Dynamodb tables using Python as a programming language.

                    • Install required libraries to get GitHub Data to AWS Dynamodb tables.

                    • Understanding GitHub APIs

                    • Setting up GitHub API Token

                    • Understanding GitHub Rate Limit

                    • Create New Repository for since

                    • Extracting Required Information using Python

                    • Processing Data using Python

                    • Grant Permissions to create AWS dynamodb tables using boto3

                    • Create AWS Dynamodb Tables

                    • AWS Dynamodb CRUD Operations

                    • Populate AWS Dynamodb Table

                    • AWS Dynamodb Batch Operations

                    Overview of Amazon AWS Athena

                    As part of this section, we will understand how to get started with AWS Athena using AWS Web console. We will also focus on basic DDL and DML or CRUD Operations using AWS Athena Query Editor.

                    • Getting Started with Amazon AWS Athena

                    • Quick Recap of AWS Glue Catalog Databases and Tables

                    • Access AWS Glue Catalog Databases and Tables using AWS Athena Query Editor

                    • Create a Database and Table using AWS Athena

                    • Populate Data into Table using AWS Athena

                    • Using CTAS to create tables using AWS Athena

                    • Overview of Amazon AWS Athena Architecture

                    • Amazon AWS Athena Resources and relationship with Hive

                    • Create a Partitioned Table using AWS Athena

                    • Develop Query for Partitioned Column

                    • Insert into Partitioned Tables using AWS Athena

                    • Validate Data Partitioning using AWS Athena

                    • Drop AWS Athena Tables and Delete Data Files

                    • Drop Partitioned Table using AWS Athena

                    • Data Partitioning in AWS Athena using CTAS

                    Amazon AWS Athena using AWS CLI

                    As part of this section, we will understand how to interact with AWS Athena using AWS CLI Commands.

                    • Amazon AWS Athena using AWS CLI - Introduction

                    • Get help and list AWS Athena databases using AWS CLI

                    • Managing AWS Athena Workgroups using AWS CLI

                    • Run AWS Athena Queries using AWS CLI

                    • Get AWS Athena Table Metadata using AWS CLI

                    • Run AWS Athena Queries with a custom location using AWS CLI

                    • Drop AWS Athena table using AWS CLI

                    • Run CTAS under AWS Athena using AWS CLI

                    Amazon AWS Athena using Python boto3

                    As part of this section, we will understand how to interact with AWS Athena using Python boto3.

                    • Amazon AWS Athena using Python boto3 - Introduction

                    • Getting Started with Managing AWS Athena using Python boto3

                    • List Amazon AWS Athena Databases using Python boto3

                    • List Amazon AWS Athena Tables using Python boto3

                    • Run Amazon AWS Athena Queries with boto3

                    • Review AWS Athena Query Results using boto3

                    • Persist Amazon AWS Athena Query Results in Custom Location using boto3

                    • Processing AWS Athena Query Results using Pandas

                    • Run CTAS against Amazon AWS Athena using Python boto3

                    Getting Started with Amazon AWS Redshift

                    As part of this section, we will understand how to get started with AWS Redshift using AWS Web console. We will also focus on basic DDL and DML or CRUD Operations using AWS Redshift Query Editor.

                    • Getting Started with Amazon AWS Redshift - Introduction

                    • Create AWS Redshift Cluster using Free Trial

                    • Connecting to Database using AWS Redshift Query Editor

                    • Get a list of tables querying information schema

                    • Run Queries against AWS Redshift Tables using Query Editor

                    • Create AWS Redshift Table using Primary Key

                    • Insert Data into AWS Redshift Tables

                    • Update Data in AWS Redshift Tables

                    • Delete data from AWS Redshift tables

                    • Redshift Saved Queries using Query Editor

                    • Deleting AWS Redshift Cluster

                    • Restore AWS Redshift Cluster from Snapshot

                    Copy Data from s3 into AWS Redshift Tables

                    As part of this section, we will go through the details about copying data from s3 into AWS Redshift tables using the AWS Redshift Copy command.

                    • Copy Data from s3 to AWS Redshift - Introduction

                    • Setup Data in s3 for AWS Redshift Copy

                    • Copy Database and Table for AWS Redshift Copy Command

                    • Create IAM User with full access on s3 for AWS Redshift Copy

                    • Run Copy Command to copy data from s3 to AWS Redshift Table

                    • Troubleshoot Errors related to AWS Redshift Copy Command

                    • Run Copy Command to copy from s3 to AWS Redshift table

                    • Validate using queries against AWS Redshift Table

                    • Overview of AWS Redshift Copy Command

                    • Create IAM Role for AWS Redshift to access s3

                    • Copy Data from s3 to AWS Redshift table using IAM Role

                    • Setup JSON Dataset in s3 for AWS Redshift Copy Command

                    • Copy JSON Data from s3 to AWS Redshift table using IAM Role

                    Develop Applications using AWS Redshift Cluster

                    As part of this section, we will understand how to develop applications against databases and tables created as part of AWS Redshift Cluster.

                    • Develop application using AWS Redshift Cluster - Introduction

                    • Allocate Elastic Ip for AWS Redshift Cluster

                    • Enable Public Accessibility for AWS Redshift Cluster

                    • Update Inbound Rules in Security Group to access AWS Redshift Cluster

                    • Create Database and User in AWS Redshift Cluster

                    • Connect to the database in AWS Redshift using psql

                    • Change Owner on AWS Redshift Tables

                    • Download AWS Redshift JDBC Jar file

                    • Connect to AWS Redshift Databases using IDEs such as SQL Workbench

                    • Setup Python Virtual Environment for AWS Redshift

                    • Run Simple Query against AWS Redshift Database Table using Python

                    • Truncate AWS Redshift Table using Python

                    • Create IAM User to copy from s3 to AWS Redshift Tables

                    • Validate Access of IAM User using Boto3

                    • Run AWS Redshift Copy Command using Python

                    AWS Redshift Tables with Distkeys and Sortkeys

                    As part of this section, we will go through AWS Redshift-specific features such as distribution keys and sort keys to create AWS Redshift tables.

                    • AWS Redshift Tables with Distkeys and Sortkeys - Introduction

                    • Quick Review of AWS Redshift Architecture

                    • Create multi-node AWS Redshift Cluster

                    • Connect to AWS Redshift Cluster using Query Editor

                    • Create AWS Redshift Database

                    • Create AWS Redshift Database User

                    • Create AWS Redshift Database Schema

                    • Default Distribution Style of AWS Redshift Table

                    • Grant Select Permissions on Catalog to AWS Redshift Database User

                    • Update Search Path to query AWS Redshift system tables

                    • Validate AWS Redshift table with

                      • AWS Redshift Federated Queries and Spectrum - Introduction

                      • Overview of integrating AWS RDS and AWS Redshift for Federated Queries

                      • Create IAM Role for AWS Redshift Cluster

                      • Setup Postgres Database Server for AWS Redshift Federated Queries

                      • Create tables in Postgres Database for AWS Redshift Federated Queries

                      • Creating Secret using Secrets Manager for Postgres Database

                      • Accessing Secret Details using Python Boto3

                      • Reading Json Data to Dataframe using Pandas

                      • Write JSON Data to AWS Redshift Database Tables using Pandas

                      • Create AWS IAM Policy for Secret and associate with Redshift Role

                      • Create AWS Redshift Cluster using AWS IAM Role with permissions on secret

                      • Create AWS Redshift External Schema to Postgres Database

                      • Update AWS Redshift Cluster Network Settings for Federated Queries

                      • Performing ETL using AWS Redshift Federated Queries

                      • Clean up resources added for AWS Redshift Federated Queries

                      • Grant Access on AWS Glue Data Catalog to AWS Redshift Cluster for Spectrum

                      • Setup AWS Redshift Clusters to run queries using Spectrum

                      • Quick Recap of AWS Glue Catalog Database and Tables for AWS Redshift Spectrum

                      • Create External Schema using AWS Redshift Spectrum

                      • Run Queries using AWS Redshift Spectrum

                      • Cleanup the AWS Redshift Cluster

Enroll now

What's inside

Learning objectives

  • Data engineering leveraging services under aws data analytics
  • Aws essentials such as s3, iam, ec2, etc
  • Understanding aws s3 for cloud based storage
  • Understanding details related to virtual machines on aws known as ec2
  • Managing aws iam users, groups, roles and policies for rbac (role based access control)
  • Managing tables using aws glue catalog
  • Engineering batch data pipelines using aws glue jobs
  • Orchestrating batch data pipelines using aws glue workflows
  • Running queries using aws athena - server less query engine service
  • Using aws elastic map reduce (emr) clusters for building data pipelines
  • Using aws elastic map reduce (emr) clusters for reports and dashboards
  • Data ingestion using aws lambda functions
  • Scheduling using aws events bridge
  • Engineering streaming pipelines using aws kinesis
  • Streaming web server logs using aws kinesis firehose
  • Overview of data processing using aws athena
  • Running aws athena queries or commands using cli
  • Running aws athena queries using python boto3
  • Creating aws redshift cluster, create tables and perform crud operations
  • Copy data from s3 to aws redshift tables
  • Understanding distribution styles and creating tables using distkeys
  • Running queries on external rdbms tables using aws redshift federated queries
  • Running queries on glue or athena catalog tables using aws redshift spectrum
  • Show more
  • Show less

Syllabus

Introduction to the course
Introduction to Data Engineering using AWS Analytics Services
Video Lectures and Reference Material
Taking the Udemy Course for new Udemy Users
Read more
Additional Costs for AWS Infrastructure for Hands-on Practice
Signup for AWS Account
Logging in into AWS Account
Overview of AWS Billing Dashboard - Cost Explorer and Budgets
Setup Local Development Environment for AWS on Windows 10 or Windows 11
Setup Local Environment on Windows for AWS
Overview of Powershell on Windows 10 or Windows 11
Setup Ubuntu VM on Windows 10 or 11 using wsl
Setup Ubuntu VM on Windows 10 or 11 using wsl - Contd...
Setup Python venv and pip on Ubuntu
Setup AWS CLI on Windows and Ubuntu using Pip
Create AWS IAM User and Download Credentials
Configure AWS CLI on Windows
Create Python Virtual Environment for AWS Projects
Setup Boto3 as part of Python Virtual Environment
Setup Jupyter Lab and Validate boto3
Setup Local Development Environment for AWS on Mac
Setup Local Environment for AWS on Mac
Setup AWS CLI on Mac
Setup AWS IAM User to configure AWS CLI
Configure AWS CLI using IAM User Credentials
Setup Python Virtual Environment on Mac using Python 3
Setup Environment for Practice using Cloud9
Introduction to Cloud9
Setup Cloud9
Overview of Cloud9 IDE
Docker and AWS CLI on Cloud9
Cloud9 and EC2
Accessing Web Applications
Allocate and Assign Static IP
Changing Permissions using IAM Policies
Increasing Size of EBS Volume
Opening ports for Cloud9 Instance
Setup Jupyter lab on Cloud9 Instance
Open SSH Port for Cloud9 EC2 Instance
Connect to Cloud9 EC2 Instance using SSH
Understand how to get started by creating required AWS s3 bucket and granting permissions on bucket to AWS IAM Roles via IAM Policies.
Introduction - AWS Getting Started
[Instructions] Introduction - AWS Getting Started
Create AWS s3 Bucket using AWS Web Console
[Instructions] Create s3 Bucket
Create AWS IAM Group and User using AWS Web Console
[Instructions] Create IAM Group and User
Overview of AWS IAM Roles to grant permissions between AWS Services
[Instructions] Overview of Roles
Create and Attach AWS IAM Custom Policy using AWS Web Console
[Instructions and Code] Create and Attach Custom Policy
Configure and Validate AWS Command Line Interface to run AWS Commands
[Instructions and Code] Configure and Validate AWS CLI
Learn all the basic concepts of AWS s3 such as copying data as objects into s3 bucket, version control, overview of s3 tiers as well as managing objects in AWS s3 using AWS CLI.
Getting Started with AWS Simple Storage aka S3
[Instructions] Getting Started with AWS S3
Setup Data Set locally to upload into AWS s3
[Instructions] Setup Data Set locally to upload into AWS s3
Adding AWS S3 Buckets and Objects using AWS Web Console
[Instruction] Adding AWS s3 Buckets and Objects
Version Control of AWS S3 Objects or Files
[Instructions] Version Control in AWS S3
AWS S3 Cross-Region Replication for fault tolerance
[Instructions] AWS S3 Cross-Region Replication for fault tolerance
Overview of AWS S3 Storage Classes or Storage Tiers
[Instructions] Overview of AWS S3 Storage Classes or Storage Tiers
Overview of Glacier in AWS s3
[Instructions] Overview of Glacier in AWS s3
Managing AWS S3 buckets and objects using AWS CLI
[Instructions and Commands] Managing AWS S3 buckets and objects using AWS CLI
Managing Objects in AWS S3 using AWS CLI - Lab
[Instructions] Managing Objects in AWS S3 using AWS CLI - Lab
Ability to create Users, Groups and Roles using AWS IAM and also attach permissions via AWS IAM Policies. One will also learn how to create custom AWS IAM Policies.
Creating AWS IAM Users with Programmatic and Web Console Access
[Instructions] Creating IAM Users
Logging into AWS Management Console using AWS IAM User
[Instructions] Logging into AWS Management Console using IAM User
Validate Programmatic Access to AWS IAM User via AWS CLI
[Instructions and Commands] Validate Programmatic Access to IAM User
Getting Started with AWS IAM Identity-based Policies
[Instructions and Commands] IAM Identity-based Policies
Managing AWS IAM User Groups
[Instructions and Commands] Managing IAM Groups
Managing AWS IAM Roles for Service Level Access
[Instructions and Commands] Managing IAM Roles
Overview of AWS Custom Policies to grant permissions to Users, Groups, and Roles
[Instructions and Commands] Overview of Custom Policies
Managing AWS IAM Groups, Users, and Roles using AWS CLI
[Instructions and Commands] Managing IAM using AWS CLI
Understand some of the AWS EC2 Key Concepts with hands on practice about how to create Key Pair, setup EC2 Instance using the key pair, update security groups, etc.
Getting Started with AWS Elastic Cloud Compute aka EC2
[Instructions] Getting Started with EC2
Create AWS EC2 Key Pair for SSH Access
[Instructions] Create EC2 Key Pair
Launch AWS EC2 Instance or Virtual Machine
[Instructions] Launch EC2 Instance
Connecting to AWS EC2 Instance or Virtual Machine using SSH
[Instructions and Commands] Connecting to EC2 Instance

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Explores AWS Services, which is standard in data engineering
Taught by industry recognized experts who are recognized in the field of data engineering
Examines batch and streaming pipeline concepts, which are highly relevant to data engineering
Explores PySpark, Apache Spark, and Athena, which are all popular technologies used by data engineers
Develops foundational skills in AWS Analytics Services, such as s3, EC2, and IAM, which are essential technologies for data engineers
Requires prerequisite knowledge of basic computing concepts, which may be a caveat for some learners

Save this course

Save Data Engineering using AWS Data Analytics to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Data Engineering using AWS Data Analytics with these activities:
Review AWS Fundamentals
Solidify your understanding of core AWS concepts to strengthen your proficiency in the course.
Show steps
  • Review AWS documentation on core services such as S3, EC2, and Lambda.
  • Complete hands-on labs or tutorials to practice using these services.
Review Python programming basics and data structures
Ensuring that your Python programming basics and data structures are up to date will help you succeed in this course.
Browse courses on Python Programming
Show steps
  • Review Python data types and operators
  • Practice working with lists, dictionaries, and sets
  • Understand the concept of object-oriented programming
  • Solve simple coding problems using Python
Create a diagram of the AWS Data Analytics stack
Visualizing the relationships and components of the AWS Data Analytics stack will enhance your understanding of how the services work together.
Show steps
  • Choose a visual tool like Draw.io or Lucidchart
  • Identify the key services of the AWS Data Analytics stack like S3, Redshift, and Athena
  • Draw and label connections between the services
  • Describe the flow of data and processes within the stack
Eight other activities
Expand to see all activities and additional details
Show all 11 activities
Join a study group or online forum dedicated to AWS Data Analytics
Engaging with other learners and experts in a study group or online forum will expand your knowledge, expose you to different perspectives, and provide opportunities for collaboration.
Show steps
Practice setting up AWS IAM roles and policies
Setting up AWS IAM roles and policies correctly is crucial for security and access control in your AWS environment. Practicing these tasks will help reinforce your understanding and ensure that you can perform them confidently.
Show steps
  • Create a new AWS IAM user with limited permissions
  • Create an AWS IAM role to grant specific permissions to the user
  • Attach the role to the user
  • Test the permissions of the user
  • Clean up the resources you created
Follow a tutorial on deploying a data pipeline using AWS Glue
Hands-on experience with deploying a data pipeline is invaluable. Following a tutorial will provide guidance and ensure that you complete all the necessary steps.
Browse courses on Data Pipelines
Show steps
  • Find an online tutorial on deploying a data pipeline using AWS Glue
  • Gather the required resources and set up your AWS environment
  • Follow the tutorial steps to create a data pipeline
  • Test the data pipeline to ensure it's working correctly
  • Clean up the resources if you don't need them anymore
AWS Lambda Function Development
Deepen your understanding of AWS Lambda functions by building and deploying your own.
Browse courses on AWS Lambda
Show steps
  • Create a simple AWS Lambda function using Python or Java.
  • Deploy your function to AWS and test its functionality.
  • Experiment with different event triggers and input data.
Develop a budgeting and cost optimization plan for an AWS Data Analytics project
Creating a budgeting and cost optimization plan will help you manage your AWS resources efficiently and avoid unexpected expenses.
Browse courses on Budgeting
Show steps
  • Estimate the costs of your AWS Data Analytics project
  • Identify potential areas for cost optimization
  • Develop a cost optimization plan
  • Implement your cost optimization plan
  • Monitor and adjust your cost optimization plan as needed
Contribute to an open-source project related to AWS Data Analytics
Contributing to an open-source project not only helps the community, but also allows you to learn from experienced developers and gain valuable practical experience.
Browse courses on Open Source
Show steps
  • Identify an open-source project related to AWS Data Analytics
  • Review the project's documentation and codebase
  • Identify an area where you can contribute
  • Create a pull request with your changes
Organize and review your notes, assignments, and quizzes from the course
By organizing and reviewing your notes, assignments, and quizzes, you can reinforce your learning and identify areas where you may need additional support.
Browse courses on Self Assessment
Show steps
  • Gather your notes, assignments, and quizzes from the course
  • Create a system to organize your materials
  • Review your materials regularly
  • Identify key concepts and areas where you need more practice
  • Use your organized materials to prepare for upcoming assessments
Develop a Data Analytics Project using AWS Services
Put your learning into practice by building a comprehensive data analytics project leveraging AWS services.
Browse courses on Data Pipeline
Show steps
  • Define the scope and objectives of your project.
  • Choose appropriate AWS services for data ingestion, processing, and visualization.
  • Implement your data pipeline using AWS services such as Glue, EMR, Athena, and Redshift.
  • Analyze and interpret your results to derive insights.

Career center

Learners who complete Data Engineering using AWS Data Analytics will develop knowledge and skills that may be useful to these careers:
Data Engineer
As a Data Engineer, you will be responsible for designing, building, and maintaining data pipelines. This course will help you build a strong foundation in AWS Data Analytics Services, which are essential for data engineering work. You will learn how to use AWS Glue, EMR, Lambda Functions, Athena, and other services to create and manage data pipelines. This course will also teach you how to use Python and other programming languages to develop data engineering applications.
Data Analyst
As a Data Analyst, you will be responsible for collecting, analyzing, and interpreting data to help businesses make informed decisions. This course will help you build a strong foundation in AWS Data Analytics Services, which are essential for data analysis work. You will learn how to use AWS Glue, EMR, Athena, and other services to collect, process, and analyze data. This course will also teach you how to use Python and other programming languages to develop data analysis applications.
Data Scientist
As a Data Scientist, you will be responsible for developing and applying statistical and machine learning models to data to help businesses solve problems and make better decisions. This course will help you build a strong foundation in AWS Data Analytics Services, which are essential for data science work. You will learn how to use AWS Glue, EMR, Athena, and other services to collect, process, and analyze data. This course will also teach you how to use Python and other programming languages to develop data science applications.
Cloud Architect
As a Cloud Architect, you will be responsible for designing and managing cloud computing solutions. This course will help you build a strong foundation in AWS Data Analytics Services, which are essential for cloud architecture work. You will learn how to use AWS Glue, EMR, Athena, and other services to build and manage data pipelines in the cloud. This course will also teach you how to use Python and other programming languages to develop cloud computing applications.
DevOps Engineer
As a DevOps Engineer, you will be responsible for bridging the gap between development and operations teams. This course will help you build a strong foundation in AWS Data Analytics Services, which are essential for DevOps work. You will learn how to use AWS Glue, EMR, Athena, and other services to build and manage data pipelines that can be deployed and managed by both development and operations teams.
Data Architect
As a Data Architect, you will be responsible for designing and managing the architecture of data systems. This course will help you build a strong foundation in AWS Data Analytics Services, which are essential for data architecture work. You will learn how to use AWS Glue, EMR, Athena, and other services to design and manage data pipelines that meet the needs of your business.
Business Intelligence Analyst
As a Business Intelligence Analyst, you will be responsible for helping businesses understand their data and make better decisions. This course will help you build a strong foundation in AWS Data Analytics Services, which are essential for business intelligence work. You will learn how to use AWS Glue, EMR, Athena, and other services to collect, process, and analyze data. This course will also teach you how to use Python and other programming languages to develop business intelligence applications.
Database Administrator
As a Database Administrator, you will be responsible for managing and maintaining databases. This course may be useful to you if you are interested in working with AWS data analytics services. You will learn how to use AWS Glue, EMR, Athena, and other services to manage and maintain data pipelines.
Software Engineer
As a Software Engineer, you will be responsible for developing and maintaining software applications. This course may be useful to you if you are interested in working with AWS data analytics services. You will learn how to use AWS Glue, EMR, Athena, and other services to develop and maintain data pipelines.
Systems Administrator
As a Systems Administrator, you will be responsible for managing and maintaining computer systems. This course may be useful to you if you are interested in working with AWS data analytics services. You will learn how to use AWS Glue, EMR, Athena, and other services to manage and maintain data pipelines.
Network Engineer
As a Network Engineer, you will be responsible for designing and managing computer networks. This course may be useful to you if you are interested in working with AWS data analytics services. You will learn how to use AWS Glue, EMR, Athena, and other services to manage and maintain data pipelines.
Security Analyst
As a Security Analyst, you will be responsible for protecting computer systems from security threats. This course may be useful to you if you are interested in working with AWS data analytics services. You will learn how to use AWS Glue, EMR, Athena, and other services to protect data pipelines from security threats.
Project Manager
As a Project Manager, you will be responsible for planning and managing projects. This course may be useful to you if you are interested in working with AWS data analytics services. You will learn how to use AWS Glue, EMR, Athena, and other services to plan and manage data pipelines.
Product Manager
As a Product Manager, you will be responsible for planning and managing products. This course may be useful to you if you are interested in working with AWS data analytics services. You will learn how to use AWS Glue, EMR, Athena, and other services to plan and manage data pipelines.
Data Warehouse Architect
As a Data Warehouse Architect, you will be responsible for designing and managing data warehouses. This course may be useful to you if you are interested in working with AWS data analytics services. You will learn how to use AWS Glue, EMR, Athena, and other services to design and manage data warehouses.

Reading list

We've selected eight books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Data Engineering using AWS Data Analytics.
Provides a comprehensive overview of data science and big data computing. Data science and big data computing are key skills for data engineers.
Provides detailed conceptual information about AWS Data Analytics Services. Reading this book as a companion to the course will help understand concepts in depth and prepare for the AWS Data Analytics Certification.
Widely-used resource for learning Python for data analysis. It focuses on using Python libraries like Pandas, NumPy, and Jupyter, which are foundational for data manipulation and analysis in the course context.
Provides a collection of case studies that illustrate how data science can be used to solve real-world problems. Data science key skill for data engineers.
Save
Similar to the previous book, this book provides a comprehensive reference for Apache Spark, a powerful distributed computing framework commonly used in big data analytics. Understanding Spark is useful for working with AWS EMR and other big data frameworks.
Provides a gentle introduction to data lakes. It covers the basics of data lakes, including their benefits, challenges, and use cases.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Data Engineering using AWS Data Analytics.
Master AWS Lambda Functions for Data Engineers using...
Most relevant
Analyzing Data on AWS
Most relevant
Mastering AWS Glue, QuickSight, Athena & Redshift Spectrum
Most relevant
Serverless Analytics on AWS
Most relevant
AWS Data Architect Bootcamp - 43 Services 500 FAQs 20+...
Most relevant
AWS Certified Data Engineer Associate 2024 - Hands On!
Most relevant
Getting Started with AWS Athena
Most relevant
AWS Lambda and API Gateway Basics - Build Serverless...
Most relevant
Build a Data Warehouse in AWS
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser