We may earn an affiliate commission when you visit our partners.
Course image
Sundog Education by Frank Kane, Stephane Maarek | AWS Certified Cloud Practitioner,Solutions Architect,Developer, Frank Kane, and Sundog Education Team

Updated for the latest SageMaker features, Generative AI (GPT / Bedrock), new AWS ML Services, and early access to MLA-C01 training. Happy learning.

Read more

Updated for the latest SageMaker features, Generative AI (GPT / Bedrock), new AWS ML Services, and early access to MLA-C01 training. Happy learning.

Nervous about passing the AWS Certified Machine Learning - Specialty exam (MLS-C01)? You should be. There's no doubt it's one of the most difficult and coveted AWS certifications. A deep knowledge of AWS and SageMaker isn't enough to pass this one - you also need deep knowledge of machine learning, and the nuances of feature engineering and model tuning that generally aren't taught in books or classrooms. You just can't prepare enough for this one.

This certification prep course is taught by Frank Kane, who spent nine years working at Amazon itself in the field of machine learning. Frank took and passed this exam on the first try, and knows exactly what it takes for you to pass it yourself. Joining Frank in this course is Stephane Maarek, an AWS expert and popular AWS certification instructor on Udemy.

In addition to the 15-hour video course, a 30-minute quick assessment practice exam is included that consists of the same topics and style as the real exam. You'll also get four hands-on labs that allow you to practice what you've learned, and gain valuable experience in model tuning, feature engineering, and data engineering.

This course is structured into the four domains tested by this exam: data engineering, exploratory data analysis, modeling, and machine learning implementation and operations. Just some of the topics we'll cover include:

  • How generative AI and large language models (LLM's) work, including the Transformer architecture (GPT) and attention-based neural networks (masked self-attention)

  • Amazon's newest generative AI services: Bedrock, SageMaker JumpStart for Generative AI, CodeWhisperer, and SageMaker Foundation Models

  • S3 data lakes

  • AWS Glue and Glue ETL

  • Kinesis data streams, firehose, and video streams

  • DynamoDB

  • Data Pipelines, AWS Batch, and Step Functions

  • Using scikit_learn

  • Data science basics

  • Athena and Quicksight

  • Elastic MapReduce (EMR)

  • Apache Spark and MLLib

  • Feature engineering (imputation, outliers, binning, transforms, encoding, and normalization)

  • Ground Truth

  • Deep Learning basics

  • Tuning neural networks and avoiding overfitting

  • Amazon SageMaker, including SageMaker Studio, SageMaker Model Monitor, SageMaker Autopilot, and SageMaker Debugger.

  • Regularization techniques

  • Evaluating machine learning models (precision, recall, F1, confusion matrix, etc.)

  • High-level ML services: Comprehend, Translate, Polly, Transcribe, Lex, Rekognition, and more

  • Building recommender systems with Amazon Personalize

  • Monitoring industrial equipment with Lookout and Monitron

  • Security best practices with machine learning on AWS

As an extra bonus, this course includes early access to training content for the recently announced AWS Certified Machine Learning Engineer - Associate exam (MLA-C01). This includes in-depth coverage of Amazon Bedrock, implementing Retrieval-Augmented Generation (RAG) Systems with Bedrock Knowledge Bases, content filtering with Bedrock Guardrails, and building LLM Agents (Agentic AI) using Bedrock Agents. This isn't just theory; a series of hands-on labs gives you practical experience with these new features.

Machine learning is an advanced certification, and it's best tackled by students who have already obtained associate-level certification in AWS and have some real-world industry experience. This exam is not intended for AWS beginners.

If there's a more comprehensive prep course for the AWS Certified Machine Learning - Specialty exam, we haven't seen it. Enroll now, and gain confidence as you walk into that testing center.

Instructor

My name is Stéphane Maarek, I am passionate about Cloud Computing, and I will be your instructor in this course. I teach about AWS certifications, focusing on helping my students improve their professional proficiencies in AWS.

I have already taught

With AWS becoming the centerpiece of today's modern IT architectures, I have decided it is time for students to learn how to be an AWS Machine Learning Professional. So, let’s kick start the course. You are in good hands.

Instructor

Hey, I'm Frank Kane, and I'm also instructing this course. I spent nine years working for Amazon from the inside as a senior engineer and senior manager, where my specialty was recommender systems and machine learning. As an instructor, I'm best known for my top-selling courses in "big data", data analytics, machine learning, Apache Spark, system design, technical management and career growth, and Elasticsearch.

I've been teaching on Udemy since 2015, where I've reached over 800,00 students all around the world.

I've worked hard to keep this course up to date with the latest developments in AWS machine learning, and to make sure you're prepared for the latest version of this exam. Let's dive in and get you ready.

This course also comes with:

  • Lifetime access to all future updates

  • A responsive instructor in the Q&A Section

  • Udemy Certificate of Completion Ready for Download

  • A 30 Day "No Questions Asked" Money Back Guarantee.

Join us in this course if you want to prepare for the AWS Machine Learning Certification and master the AWS platform.

Enroll now

What's inside

Learning objectives

  • What to expect on the aws certified machine learning specialty exam
  • Amazon sagemaker's built-in machine learning algorithms (xgboost, blazingtext, object detection, etc.)
  • Feature engineering techniques, including imputation, outliers, binning, and normalization
  • High-level ml services: comprehend, translate, polly, transcribe, lex, rekognition, and more
  • Data engineering with s3, glue, kinesis, and dynamodb
  • Exploratory data analysis with scikit_learn, athena, apache spark, and emr
  • Deep learning and hyperparameter tuning of deep neural networks
  • Automatic model tuning and operations with sagemaker
  • L1 and l2 regularization
  • Applying security best practices to machine learning pipelines

Syllabus

Introduction

Get the most from this course - learn how to adjust the video playback speed, enable closed captions, and ensure good video streaming.

Read more
Course Introduction: What to Expect
Important note

Download the notebooks you'll need throughout the hands-on labs in this course.

Create data repositories for machine learning; Identify and implement a data-ingestion solution; Identify and implement a data-transformation solution.
Section Intro: Data Engineering
Set up an AWS Billing Alarm
Amazon S3 - Overview
Amazon S3 Storage Classes + Glacier
Amazon S3 Storage + Glacier - Hands On
Amazon S3 Lifecycle Rules (with S3 Analytics)
Amazon S3 Lifecycle Rules - Hands On
Amazon S3 - Bucket Policy
Amazon S3 - Bucket Policy Hands On
Amazon S3 - Encryption
Amazon S3 - Encryption Hands On
Amazon S3 - Default Encryption
Amazon S3 - VPC Endpoints
Kinesis Data Streams & Kinesis Data Firehose
Lab 1.1 - Kinesis Data Firehose
Kinesis Data Analytics
Lab 1.2 - Managed Apache Flink (Formely Kinesis Data Analytics)
Kinesis Data Analytics and Lambda / Managed Service for Apache Flink
Kinesis Video Streams
Kinesis ML Summary
Glue Data Catalog & Crawlers
Lab 1.3 - Glue Data Catalog
Glue ETL
Lab 1.4 - Glue ETL
Glue Data Brew
Lab 1.5 - Glue Data Brew
Lab 1.6 - Athena
Lab 1 - Cleanup
AWS Data Stores in Machine Learning
AWS Data Pipelines
AWS Batch
AWS DMS - Database Migration Services
AWS Step Functions
Full Data Engineering Pipelines
Random things you need to know: AWS DataSync and MQTT
Data Engineering Summary

Reinforce your knowledge of some key points in this section.

Sanitize and prepare data for modeling; Perform feature engineering; Analyze and visualize data for machine learning
Section Intro: Data Analysis

High-level overview of how Jupyter notebooks, Pandas, Numpy, Matplotlib, Seaborn, and scikit-learn play a role in exploratory data analysis and preparing your training data for machine learning.

We'll walk through a Jupyter notebook that explores, cleans, and normalizes training data to build a real machine learning model to predict if mammogram results are benign or malignant.

We'll cover the differences between numerical, categorical, and ordinal data.

Topics covered include normal distributions, Poisson distributions, binomial distributions, Bernoulli distributions, and the difference between probability density functions and probability mass functions.

We'll talk about how time series data consists of separate signals from trends, seasonality, and noise.

A quick overview of Amazon Athena, and how it can be used to query your unstructured, structured, or semi-structured data in S3 in a serverless setting.

High-level features of QuickSight, Amazon's data visualization product, including its new machine learning capabilities.

There are lots of visualization choices; bar and line graphs, heat maps, tree maps, pivot tables, and much more - all of which are offered by QuickSight. Let's talk about how to decide which kind of graph is most appropriate for illustrating different aspects of your data.

How Amazon EMR works, including how a Hadoop cluster's architecture works. What is HDFS and EMRFS? What are different usage modes for EMR? How does it scale? What can it do?

How Apache Spark has supplanted MapReduce; the architecture of Spark, and its capabilities, including Spark Streaming, MLLib, GraphX, and Spark SQL. How Spark integrates with AWS and Kinesis.

Zeppelin notebooks run on your EMR cluster to control Spark, but EMR notebooks can run outside of your cluster and control the provisioning of the cluster itself, too. We'll also discuss the security features available with EMR, and how to choose an instance type for the master, core, and task nodes of your cluster.

We'll introduce what the world of feature engineering is all about, and why it is so important to getting good results from your machine learning models. And, we'll dive into the "curse of dimensionality," and why more features usually isn't better.

A big part of feature engineering is dealing with missing data. We'll discuss various approaches, including mean imputation, dropping, and using machine learning for imputation including KNN, deep learning, and regression methods such as MICE.

Training models with highly unbalanced data sets - such as in fraud detection, where very few observations are actual fraud, is a big problem. We'll talk about ways to address this from a feature engineering standpoint, including oversampling, undersampling, and SMOTE.

We'll introduce how to compute variance and standard deviation, and how to identify outliers as a function of standard deviation and in box-and-whisker plots. We'll also give a shout-out to Amazon's Random Cut Forest algorithm.

We'll round out our tour of feature engineering with a discussion of binning numerical data, transforming data to create new features to discover sub-linear and super-linear patterns, one-hot encoding, scaling and normalization, and the importance of shuffling your training data.

Humans can be the most important tool for creating missing data, especially labels. We'll talk about how Amazon SageMaker Ground Truth manages human labeling tasks and optimizes them, as well as using unsupervised techniques such as Rekognition and Comprehend to fabricate features and labels from existing data.

As TF-IDF (Term Frequency - Inverse Document Frequency) may be new to you, we'll start by reviewing how TF-IDF works and how it fits into a search engine solution.

We'll walk through the process of creating an JupyterLab PySpark Notebook within an EMR Workspace, backed by an EMR EC2 cluster, within a EMR Studio environment. We'll use this notebook to pre-process real Wikipedia data, build a TF/IDF model around it, and use it for actual search.

Let's do a quick knowledge check of what you've learned in the exploratory data analysis domain. These aren't the sorts of questions you'll get on the exam; it's just for review purposes.

Understand ML and neural networks basics, not just within AWS.
Section Intro: Modeling

We'll cover the biological inspiration of deep learning, and how this translates to artificial neural networks.

We'll dive deep into activation functions, including linear, step, logistic / sigmoid, hyperbolic tangent, ReLU, Leaky ReLu, PReLu, Swish, and more - and how to choose between them.

Convolutional Neural Networks, or CNN's, are inspired by the human visual cortex and are useful for object recognition and other tasks. We'll cover how they work, some popular CNN architectures such as ResNet, and how CNN's are built in Keras and Tensorflow.

Recurrent Neural Networks, or RNN's, are well suited for problems involving sequences of data, such as predicting markets or machine translation. We'll cover how RNN's work, some popular variants included LSTM and GRU, and different applications of them.

Modern NLP with BERT and GPT, and Transfer Learning

A brief discussion of EMR's built-in support for deep learning with Apache MXNet, deep learning AMI's for EC2, and EC2 instance types appropriate for deep learning.

Hyperparameter tuning of deep neural networks is a complex subject. We'll talk about how deep neural nets are trained with gradient descent, and how your choice of learning rate and batch size affects your training. Sometimes it's counter-intuitive!

Deep neural networks are prone to overfitting. We'll cover some simple regularization techniques to combat this, including dropout, early stopping, and simply using a smaller network.

What is L1 and L2 regularization, how do they differ, and how do you choose between them?

What is the vanishing gradient problem, and what can be done to combat it? Also, what's gradient checking?

How to read and interpret various kinds of confusion matrices, allowing you to distinguish true and false positives and negatives from an overall accuracy metric.

We'll cover various ways to measure classifiers, including precision, recall, ROC curves, F1, RMSE, and AUC. We'll discuss how to interpret these metrics, and how to decide which one is relevant to the problem you're trying to solve.

Two ensemble methods are bagging and boosting, and they solve very different problems.

Quiz: Deep Learning and Machine Learning
Use SageMaker to prepare data, build models, tune them, and deploy them.

The heart of AWS's machine learning offering is SageMaker. We'll cover what it does and its architecture at a high level, and how it's used together with ECR and S3.

The Linear Learner algorithm in SageMaker is a robust means of regression or classification in systems that can be described in a linear manner.

The XGBoost algorithm is winning a lot of Kaggle competitions lately; if you care about accuracy, it's a great choice. SageMaker includes the open source XGBoost algorithm; we'll cover what it does, how to use it, and how to tune it.

The Seq2Seq algorithm is commonly used for machine translation tasks. It is implemented as an RNN or CNN with attention under the hood.

DeepAR is a powerful RNN-based model for extrapolating time series, and sets of related time series, into the future.

BlazingText can operate in supervised mode to assign labels to sentences, or in Word2Vec mode to build an embedding layer of related words.

Object2Vec is a general mechanism for building embeddings of objects based on arbitrary pairs of data.

The Object Detection algorithm identifies objects in images, together with their bounding boxes.

Image Classification is used to identify what objects are in an image, but without data on where those objects are within the image.

Semantic Segmentation identifies objects within images at a per-pixel level, using segmentation masks.

Random Cut Forest is Amazon's algorithm for anomaly detection in a series of data.

Neural Topic Modeling is a neural network-based technique for clustering documents into a specific number of topics, in an unsupervised manner.

LDA is another topic modeling technique in SageMaker that does not rely on neural networks, but just looks at commonalities in the terms contained by documents.

KNN is a simple method for classification or regression by just analyzing the K observations most similar to a new observation.

K-Means Clustering in SageMaker
Principal Component Analysis (PCA) in SageMaker

Factorization Machines are generally used for classification or regression of sparse data, for example in recommender systems.

IP Insights uses deep learning to identify anomalous behavior from IP addresses in your web log data.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Comprehensive study of AWS machine learning tools, algorithms, and services
Strong reputation of instructors with significant industry experience
Hands-on labs and interactive materials provide practical experience
Taught by Frank Kane, a senior engineer with nine years at Amazon specializing in recommender systems and machine learning
Taught by Stephane Maarek, an AWS expert and popular instructor
Addresses knowledge gaps for learners with associate-level AWS certification and industry experience

Save this course

Save AWS Certified Machine Learning Specialty 2024 - Hands On! to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in AWS Certified Machine Learning Specialty 2024 - Hands On! with these activities:
Refresher: Go over basics of statistical distributions
Cover foundational probability concepts for statistics and machine learning.
Browse courses on Probability Distributions
Show steps
  • Review statistical probability fundamentals
  • Study probability density functions, probability mass functions, and central limit theorem
  • Examine specific distributions such as normal, poisson, binomial, and Bernoulli
Follow an AWS tutorial on using Amazon SageMaker for machine learning
Gain hands-on experience with SageMaker's built-in algorithms, pre-trained models, and automation features.
Show steps
  • Choose an appropriate tutorial
  • Follow the tutorial step-by-step
  • Experiment with different parameters and settings
Build an AWS EMR cluster with SageMaker Studio
Prepare machine learning data reliably at scale in a serverless or managed environment.
Browse courses on AWS EMR
Show steps
  • Create a SageMaker Studio notebook instance
  • Create an EMR cluster
  • Connect to the EMR cluster
  • Run a Spark job on the EMR cluster
Two other activities
Expand to see all activities and additional details
Show all five activities
Practice using Amazon Athena to query data in S3
Master querying large datasets, both structured and semi-structured, without provisioning or managing infrastructure.
Browse courses on Amazon Athena
Show steps
  • Create an Athena data source
  • Create an Athena database and table
  • Query data in the Athena table
Develop a machine learning model using AWS SageMaker
Build, train, and deploy machine learning models with access to managed infrastructure, algorithms, and tools.
Browse courses on Machine Learning
Show steps
  • Define the problem and gather data
  • Choose an appropriate SageMaker algorithm
  • Train and evaluate the model
  • Deploy the model

Career center

Learners who complete AWS Certified Machine Learning Specialty 2024 - Hands On! will develop knowledge and skills that may be useful to these careers:

Reading list

We haven't picked any books for this reading list yet.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to AWS Certified Machine Learning Specialty 2024 - Hands On!.
AWS Certified AI Practitioner AIF-C01 - Hands On, In...
Most relevant
Amazon SageMaker
Most relevant
Generative AI Foundations for Cloud
Most relevant
Implementing and Operating AWS Machine Learning Solutions
Most relevant
Machine Learning on AWS Deep Dive
Most relevant
Building Machine Learning Pipelines on AWS
Most relevant
Introduction to Machine Learning
Most relevant
Deep Learning Using TensorFlow and Apache MXNet on Amazon...
Most relevant
Hands-on Machine Learning with AWS and NVIDIA
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser