We may earn an affiliate commission when you visit our partners.
Noah Gift and Kennedy Behrman

In this course, you will:

Read more

In this course, you will:

  • Explore essential data engineering platforms (Hadoop, Spark, and Snowflake) and learn how to optimize and manage them
  • Delve into Databricks, a powerful platform for executing data analytics and machine learning tasks
  • Hone your Python data science skills with PySpark
  • Discover the key concepts of MLflow, an open-source platform for managing the end-to-end machine learning lifecycle, and learn how to integrate it with Databricks
  • Gain methodologies to help you improve your project management and workflow skills for data engineering, including applying Kaizen, DevOps, and Data Ops best practices

This course is designed for learners who want to pursue or advance their career in data science or data engineering, or for software developers or engineers who want to grow their data management skill set. With quizzes to test your knowledge throughout, this comprehensive course will help guide your learning journey to become a proficient data engineer, ready to tackle the challenges of today's data-driven world.

What's inside

Learning objectives

  • Optimize and manage hadoop, spark, and snowflake platforms
  • Execute data analytics and machine learning tasks using databricks
  • Enhance python data science skills with pyspark
  • Manage end-to-end machine learning lifecycle with mlflow
  • Apply kaizen, devops, and dataops methodologies for data engineering

Syllabus

Module 1: Overview and Introduction to PySpark (7 hours)
- 10 videos (Total 25 minutes)
- Meet your Co-Instructor: Kennedy Behrman (0 minutes, Preview module)
Read more

Traffic lights

Read about what's good
what should give you pause
and possible dealbreakers
Develops foundational and advanced skills for those in data science and engineering fields
Taught by experienced professionals in the industry
Explores a range of essential data engineering platforms and technologies
Covers project management and workflow optimization practices
Requires some prior programming experience, limiting accessibility for absolute beginners

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Spark, Hadoop, and Snowflake for Data Engineering with these activities:
Review Concepts in PySpark Study Materials
Reviewing these topics will bring back important concepts in PySpark at the start of the course to stimulate your thinking.
Browse courses on Pyspark
Show steps
  • Review big data platforms, such as Hadoop and Spark
  • Go over PySpark dataframes
  • Examine RDDs, Spark SQL, and dataframe concepts
Work Through PySpark Practice Exercises
These exercises will initiate the implementation of concepts and solidifies your understanding of PySpark.
Browse courses on Pyspark
Show steps
  • Load PySpark and setup the environment
  • Implement PySpark dataframe operations
  • Execute PySpark SQL queries
Join a Mentoring Program for Data Engineering
Enhance your understanding and practical skills by sharing your knowledge and guiding others in their data engineering journey.
Browse courses on Mentorship
Show steps
  • Identify a mentoring program in data engineering
  • Apply and get matched with a mentee
  • Provide guidance and support to your mentee
11 other activities
Expand to see all activities and additional details
Show all 14 activities
Read "Spark: The Definitive Guide"
Gain a comprehensive understanding of Spark's architecture, programming model, and advanced techniques.
Show steps
  • Obtain a copy of the book
  • Read through the relevant chapters
  • Take notes and highlight important concepts
Explore Snowflake Documentation and Tutorials
Self-guided exploration of resources will grant you more profound insights into the features and functionalities of Snowflake.
Browse courses on Snowflake
Show steps
  • Familiarize yourself with Snowflake architecture and components
  • Review best practices for Snowflake data management
  • Experiment with Snowflake's scripting and programming capabilities
Run PySpark examples
Reinforce your understanding of PySpark by running the examples provided in the course materials.
Browse courses on Pyspark
Show steps
  • Navigate to the PySpark examples directory.
  • Run the examples using the provided commands.
  • Observe the output and compare it to the expected results.
Follow Databricks Academy Tutorials
Supplement your learning by following structured tutorials from Databricks Academy to enhance your practical skills.
Browse courses on Databricks
Show steps
  • Identify relevant tutorials
  • Follow the tutorials step-by-step
  • Complete the exercises and quizzes
Create a Resource Collection on Snowflake
Build your knowledge base by gathering and organizing resources related to Snowflake's features and capabilities.
Browse courses on Snowflake
Show steps
  • Identify relevant resources (e.g., documentation, tutorials, articles)
  • Organize the resources into a structured format
  • Share the resource collection with others
Attend a Data Analytics Workshop
Participate in a hands-on workshop to gain practical experience and deepen your understanding of data analytics concepts.
Browse courses on Data Analytics
Show steps
  • Identify a relevant workshop
  • Register and attend the workshop
  • Engage actively in the hands-on exercises
Spark SQL Practice Problems
Reinforce your understanding of Spark SQL syntax and operations by attempting to solve practice problems.
Show steps
  • Access the practice problems
  • Attempt to solve the problems on your own
  • Review the solutions provided
Write a Blog Post on Data Engineering with Hadoop
Solidify your understanding by explaining concepts related to Hadoop and data engineering in a blog post.
Browse courses on Hadoop
Show steps
  • Choose a specific topic within Hadoop and data engineering
  • Research and gather information
  • Write a well-structured and informative blog post
  • Publish and promote your blog post
Build a Data Pipeline Prototype in Databricks
Hands-on implementation of these concepts will greatly enhance your comprehension and real-world readiness.
Browse courses on Azure Databricks
Show steps
  • Establish a Databricks environment
  • Design and develop your data pipeline architecture
  • Implement data ingestion, transformation, and visualization
Build a Data Pipeline with PySpark
Apply your skills to a practical project involving data extraction, transformation, and loading using PySpark.
Browse courses on Pyspark
Show steps
  • Define the project scope and objectives
  • Design the data pipeline architecture
  • Implement the pipeline using PySpark
  • Test and evaluate the pipeline
Contribute to an Open-Source Data Engineering Project
Enhance your skills and contribute to the data engineering community by participating in open-source projects.
Browse courses on Open Source
Show steps
  • Identify an open-source data engineering project
  • Explore the project's codebase and documentation
  • Identify an area where you can contribute
  • Submit a pull request with your contribution

Career center

Learners who complete Spark, Hadoop, and Snowflake for Data Engineering will develop knowledge and skills that may be useful to these careers:

Reading list

We haven't picked any books for this reading list yet.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Similar courses are unavailable at this time. Please try again later.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser