We may earn an affiliate commission when you visit our partners.
Course image
Udacity logo

Spark and Data Lakes

Sean Murdock, Matt Swaffer, Ben Goldberg, Amanda Moran, and Valerie Scarlata

Learn Spark & Data Lakes with Udacity's online course. Learn about the big data ecosystem and the power of Apache Spark for data wrangling and transformation.

Prerequisite details

Read more

Learn Spark & Data Lakes with Udacity's online course. Learn about the big data ecosystem and the power of Apache Spark for data wrangling and transformation.

Prerequisite details

To optimize your success in this program, we've created a list of prerequisites and recommendations to help you prepare for the curriculum. Prior to enrolling, you should have the following knowledge:

  • Amazon web services basics
  • Database fundamentals
  • Intermediate Python
  • Intermediate SQL
  • Data modeling basics

You will also need to be able to communicate fluently and professionally in written and spoken English.

What's inside

Syllabus

In this course you'll learn how Spark evaluates code and uses distributed computing to process and transform data. You'll work in the big data ecosystem to build data lakes and data lake houses.
Read more
In this lesson, you will learn about the problems that Apache Spark is designed to solve. You'll also learn about the greater Big Data ecosystem and how Spark fits into it.
In this lesson, we'll dive into how to use Spark for wrangling, filtering, and transforming distributed data with PySpark and Spark SQL
In this lesson, you will learn to use Spark and work with data lakes with Amazon Web Services using S3, AWS Glue, and AWS Glue Studio.
In this lesson you'll work with Lakehouse zones. You will build and configure these zones in AWS.
In this project, you'll work with sensor data that trains a machine learning model. You'll load S3 JSON data from a data lake into Athena tables using Spark and AWS Glue.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Builds essential data science skills in Apache Spark
Focuses on practical applications of Apache Spark for big data wrangling and transformation
Provides hands-on experience through projects and labs
Taught by experienced instructors in the data science field
Course highlights the power of Spark for processing and transforming data in a distributed computing environment
Insufficient information available on potential prerequisites or recommendations for the program

Save this course

Save Spark and Data Lakes to your list so you can find it easily later:
Save

Activities

Coming soon We're preparing activities for Spark and Data Lakes. These are activities you can do either before, during, or after a course.

Career center

Learners who complete Spark and Data Lakes will develop knowledge and skills that may be useful to these careers:
Data Scientist
Apache Spark is among the tools used by Data Scientists to build, train, and evaluate machine learning models. This course can help someone aiming for this career by providing a solid foundation in the core concepts of Apache Spark and how to use it to tackle a wide range of Big Data challenges.
Data Engineer
Structured Query Language (SQL) is widely used in Data Engineering, and this course can help one aiming to become a Data Engineer by providing a solid foundation in Apache Spark, including various fundamental Spark SQL concepts.
Software Engineer
Apache Spark is used in various software applications, especially those involving data at scale, making it a valuable skill for Software Engineers. This course, which teaches key concepts of Apache Spark and how to use it in distributed computing environments, can help someone targeting a career in Software Engineering.
Systems Engineer
Systems Engineers are involved in designing, deploying, and maintaining data systems, so a solid understanding of Apache Spark, which is widely used for large-scale data processing, can be an advantage. This course can provide a Systems Engineer with an introduction to this technology, helping them to be competitive in the job market.
Cloud Architect
Apache Spark is deployed on cloud platforms including AWS, and it is part of the skill set for Cloud Architects. This course can help someone aiming to become a Cloud Architect by offering an overview of Apache Spark and how to use it to build data lakes and data lake houses on AWS.
Machine Learning Engineer
Machine Learning Engineers use Spark's machine learning library to prepare data for modeling, train models, and evaluate their performance. This course provides foundational knowledge of Apache Spark and how to use it for these tasks, which can be beneficial for someone pursuing a career in Machine Learning Engineering.
Business Analyst
Business Analysts leverage Apache Spark to analyze large datasets and derive insights to support decision-making. This course can help one seeking a career as a Business Analyst by introducing them to the fundamentals of Apache Spark and how to use it for data wrangling and transformation.
Data Analyst
Data Analysts use Apache Spark to explore, analyze, and interpret large datasets. This course can provide someone pursuing a career as a Data Analyst with a solid foundation in Apache Spark and how to use it for data analysis tasks.
Software Developer
Software Developers use Apache Spark to build data-intensive applications. This course can benefit someone aiming to become a Software Developer by providing them with an introduction to Apache Spark and its applications in software development.
Data Warehouse Engineer
Data Warehouse Engineers design, build, and maintain data warehouses, which often involve Apache Spark. This course can provide someone aspiring to become a Data Warehouse Engineer with an introduction to Apache Spark and its applications in data warehousing.
Quantitative Analyst
Quantitative Analysts leverage Apache Spark to analyze large financial datasets and build models for risk assessment, trading strategies, and portfolio optimization. This course can provide someone pursuing a career as a Quantitative Analyst with an introduction to the fundamentals of Apache Spark and how to use it for these tasks.
DevOps Engineer
DevOps Engineers collaborate in the development and operation of software systems, and Apache Spark is often used in these systems. This course can provide someone pursuing a career as a DevOps Engineer with an introduction to Apache Spark and how to use it in software development and operations.
Infrastructure Engineer
Infrastructure Engineers provide support for data systems, which may include Apache Spark. This course can benefit someone aiming to become an Infrastructure Engineer by giving them an understanding of Apache Spark and how to use it in distributed computing environments.
Data Architect
Data Architects design and manage data systems, and a solid understanding of Apache Spark is beneficial in this role. This course can provide someone aiming to become a Data Architect with an introduction to Apache Spark and how to use it for large-scale data processing.
Database Administrator
Database Administrators (DBAs) are responsible for managing and maintaining databases, which may include Apache Spark. This course can provide someone pursuing a career as a DBA with a basic understanding of Apache Spark.

Reading list

We've selected ten books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Spark and Data Lakes.
Provides a comprehensive introduction to Apache Spark, covering its core concepts, features, and applications. It is written by some of the original creators of Spark, ensuring its accuracy and relevance to the course.
Is the definitive guide to Apache Spark, written by its original creators. It provides a comprehensive overview of Spark, its architecture, and its applications. It is an excellent resource for both beginners and experienced Spark users.
Focuses on advanced analytics with Spark, covering topics such as machine learning, graph processing, and data exploration. It provides practical examples and exercises, extending the course's coverage of Spark's capabilities.
A comprehensive guide to data lake implementation and management, providing best practices and industry insights.
Provides a comprehensive overview of the Hadoop ecosystem, including HDFS, MapReduce, and YARN. While not directly focused on Spark, it offers a valuable foundation for understanding the context in which Spark operates.
A beginner-friendly introduction to data lakes, covering their benefits, challenges, and best practices.
While not specific to Spark or data lakes, this book provides valuable insights into the business applications of data analysis and modeling.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Spark and Data Lakes.
Data lakes and Lakehouses with Spark and Azure Databricks
Most relevant
Introduction to Big Data with Spark and Hadoop
Most relevant
Apache Spark 2.0 with Java -Learn Spark from a Big Data...
Most relevant
Scala and Spark for Big Data and Machine Learning
Most relevant
Apache Spark for Data Engineering and Machine Learning
Most relevant
Getting Started with Apache Spark on Databricks
Most relevant
Apache Spark 3 Fundamentals
Most relevant
Introduction to Data Engineering
Most relevant
Spark and Python for Big Data with PySpark
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser