- Meet your Co-Instructor: Noah Gift (1 minute)
- Overview of Big Data Platforms (1 minute)
- Getting Started with Hadoop (1 minute)
- Getting Started with Spark (1 minute)
- Introduction to Resilient Distributed Datasets (RDD) (2 minutes)
- Resilient Distributed Datasets (RDD) Demo (4 minutes)
- Introduction to Spark SQL (1 minute)
- PySpark Dataframe Demo: Part 1 (3 minutes)
- PySpark Dataframe Demo: Part 2 (7 minutes)
- 9 readings (Total 90 minutes)
- Welcome to Data Engineering Platforms with Python! (10 minutes)
- What is Apache Hadoop? (10 minutes)
- What is Apache Spark? (10 minutes)
- Use Apache Spark in Azure Databricks (optional) (10 minutes)
- Choosing between Hadoop and Spark (10 minutes)
- What are RDDs? (10 minutes)
- Getting Started: Creating RDD's with PySpark (10 minutes)
- Spark SQL, Dataframes and Datasets (10 minutes)
- PySpark and Spark SQL (10 minutes)
- 7 quizzes (Total 210 minutes)
- PySpark (30 minutes)
- Big Data Platforms (30 minutes)
- Apache Hadoop Concepts (30 minutes)
- Apache Spark Concepts (30 minutes)
- RDD Concepts (30 minutes)
- Spark SQL Concepts (30 minutes)
- PySpark Dataframe Concepts (30 minutes)
- 2 discussion prompts (Total 20 minutes)
- Meet and Greet (optional) (10 minutes)
- Let Us Know if Something's Not Working (10 minutes)
- 2 ungraded labs (Total 120 minutes)
- Practice: Creating RDD's with PySpark (60 minutes)
- Practice: Reading Data into Dataframes (60 minutes)
Module 2: Snowflake (4 hours)
- 8 videos (Total 27 minutes)
- What is Snowflake? (2 minutes, Preview module)
- Snowflake Layers (2 minutes)
- Snowflake Web UI (3 minutes)
- Navigating Snowflake (3 minutes)
- Creating a Table in Snowflake (5 minutes)
- Snowflake Warehouses (3 minutes)
- Writing to Snowflake (3 minutes)
- Reading from Snowflake (2 minutes)
- 5 readings (Total 50 minutes)
- Accessing Snowflake (10 minutes)
- Detailed View Inside Snowflake (10 minutes)
- Snowsight: The Snowflake Web Interface (10 minutes)
- Working with Warehouses (10 minutes)
- Python Connector Documentation (10 minutes)
- 6 quizzes (Total 180 minutes)
- Snowflake (30 minutes)
- Snowflake Architecture (30 minutes)
- Snowflake Layers (30 minutes)
- Navigating Snowflake (30 minutes)
- Creating a Table (30 minutes)
- Writing to Snowflake (30 minutes)
Module 3: Azure Databricks and MLFlow (5 hours)
- 16 videos (Total 71 minutes)
- Accessing Databricks (0 minutes, Preview module)
- Spark Notebooks with Databricks (4 minutes)
- Using Data with Databricks (4 minutes)
- Working with Workspaces in Databricks (3 minutes)
- Advanced Capabilities of Databricks (1 minute)
- PySpark Introduction on Databricks (7 minutes)
- Exploring Databricks Azure Features (3 minutes)
- Using the DBFS to AutoML Workflow (4 minutes)
- Load, Register and Deploy ML Models (2 minutes)
- Databricks Model Registry (2 minutes)
- Model Serving on Databricks (2 minutes)
- What is MLOps? (12 minutes)
- Exploring Open-Source MLFlow Frameworks (5 minutes)
- Running MLFlow with Databricks (6 minutes)
- End to End Databricks MLFlow (4 minutes)
- Databricks Autologging with MLFlow (4 minutes)
- 7 readings (Total 70 minutes)
- What is Azure Databricks? (10 minutes)
- Introduction to Databricks Machine Learning (10 minutes)
- What is the Databricks File System (DBFS)? (10 minutes)
- Serverless Compute with Databricks (10 minutes)
- MLOps Workflow on Azure Databricks (10 minutes)
- Run MLFlow Projects on Azure Databricks (10 minutes)
- Databricks Autologging (10 minutes)
- 4 quizzes (Total 120 minutes)
- DataBricks (30 minutes)
- PySpark SQL (30 minutes)
- PySpark DataFrames (30 minutes)
- MLFlow with Databricks (30 minutes)
- 1 ungraded lab (Total 60 minutes)
- ETL-Part-1: Keyword Extractor Tool to HashTag Tool (60 minutes)
Module 4: DataOps and Operations Methodologies (12 hours)
- 21 videos (Total 502 minutes)
- Kaizen Methodology for Data (4 minutes, Preview module)
- Introducing GitHub CodeSpaces (9 minutes)
- Compiling Python in GitHub Codespaces (18 minutes)
- Walking through Sagemaker Studio Lab (28 minutes)
- Pytest Master Class (Optional) (166 minutes)
- What is DevOps? (2 minutes)
- DevOps Key Concepts (35 minutes)
- Continuous Integration Overview (32 minutes)
- Build an NLP in Cloud9 with Python (43 minutes)
- Build a Continuously Deployed Containerized FastAPI Microservice (43 minutes)
- Hugo Continuous Deploy on AWS (18 minutes)
- Container Based Continuous Delivery (8 minutes)
- What is DataOps? (1 minute)
- DataOps and MLOps with Snowflake (61 minutes)
- Building Cloud Pipelines with Step Functions and Lambda (16 minutes)
- What is a Data Lake? (2 minutes)
- Data Warehouse vs. Feature Store (2 minutes)
- Big Data Challenges (1 minute)
- Types of Big Data Processing (1 minute)
- Real-World Data Engineering Pipeline (2 minutes)
- Data Feedback Loop (0 minutes)
- 6 readings (Total 60 minutes)
- GitHub Codespaces Overview (10 minutes)
- Getting Started with Amazon SageMaker Studio Lab (10 minutes)
- Teaching MLOps at Scale with GitHub (Optional) (10 minutes)
- Getting Started with DevOps and Cloud Computing (10 minutes)
- Benefits of Serverless ETL Technologies (10 minutes)
- Next Steps (10 minutes)
- DataOps and Operations Methodologies (30 minutes)
- Kaizen Methodology (30 minutes)
- DevOps (30 minutes)
- DataOps (30 minutes)
- ETL-Part2: SQLite ETL Destination (60 minutes)