We may earn an affiliate commission when you visit our partners.
Course image
Yoav Freund

In data science, data is called "big" if it cannot fit into the memory of a single standard laptop or workstation.

The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark.

In this course, part of the Data Science MicroMasters program, you will learn what the bottlenecks are in massive parallel computation and how to use spark to minimize these bottlenecks.

Read more

In data science, data is called "big" if it cannot fit into the memory of a single standard laptop or workstation.

The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark.

In this course, part of the Data Science MicroMasters program, you will learn what the bottlenecks are in massive parallel computation and how to use spark to minimize these bottlenecks.

You will learn how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib).

In this course, as in the other ones in this MicroMasters program, you will gain hands-on experience using PySpark within the Jupyter notebooks environment.

What's inside

Learning objectives

  • Programming spark using pyspark
  • Identifying the computational tradeoffs in a spark application
  • Performing data loading and cleaning using spark and parquet
  • Modeling data through statistical and machine learning methods

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Provides a strong foundation for learners interested in data science
Taught by Yoav Freund, recognized expert in machine learning and data science
Leverages PySpark, a powerful library for data analysis and machine learning
Provides hands-on experience through Jupyter notebooks, an industry-standard environment for data science
May require prior programming experience or willingness to learn
Assumes familiarity with basic statistics and linear algebra

Save this course

Save Big Data Analytics Using Spark to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Big Data Analytics Using Spark with these activities:
Review Spark: The Definitive Guide
Spark: The Definitive Guide will provide additional in-depth knowledge on spark and can help reinforce key concepts in this course
Show steps
  • Read Chapters 1-3
Review spark memory management
Reviewing spark memory management will increase retention of knowledge and reinforce concepts learned in this course.
Browse courses on Big Data
Show steps
  • Read Apache Spark documentation on Memory Management
Create a Spark Application
Creating a spark application will put into practice the concepts learned in this course and help to improve practical understanding and skills.
Show steps
  • Plan and design a spark application
  • Implement the spark application in pyspark
One other activity
Expand to see all activities and additional details
Show all four activities
Volunteer on a Spark Project
Volunteering on a spark project will provide hands-on experience and opportunities to contribute to the spark community while reinforcing concepts learned in this course.
Show steps
  • Find a volunteer opportunity through spark meetups or online platforms
  • Contribute to spark projects or initiatives.

Career center

Learners who complete Big Data Analytics Using Spark will develop knowledge and skills that may be useful to these careers:
Data Analyst
A Data Analyst collects, cleans, and analyzes data to identify trends and patterns. This course can be a perfect starting point for one who wishes to be a Data Analyst by providing the necessary skills to acquire, process, and analyze big datasets. Through the course, you will learn how to use Spark and Pyspark, a popular tool for working with big data. In addition, you will learn how to use statistical and machine learning algorithms to model data and identify patterns, which are essential skills for a Data Analyst.
Machine Learning Engineer
A Machine Learning Engineer designs, builds, and deploys machine learning models. This course can be helpful for a Machine Learning Engineer who wants to work with big data. The course will teach you how to use Spark, a popular tool for working with big data, and how to use it to train and deploy machine learning models. In addition, you will learn how to use statistical and machine learning algorithms to model data, which is essential for building machine learning models.
Data Scientist
A Data Scientist uses data to solve business problems. This course can be helpful for a Data Scientist who wants to work with big data. The course will teach you how to use Spark, a popular tool for working with big data, and how to use it to acquire, process, and analyze big datasets. In addition, you will learn how to use statistical and machine learning algorithms to model data, which is essential for solving business problems using data.
Data Engineer
A Data Engineer designs, builds, and maintains data pipelines. This course can be very helpful for a Data Engineer who wants to work with big data. The course will teach you how to use Spark, a popular tool for working with big data, and how to use it to load, clean, and transform data. In addition, you will learn how to use statistical and machine learning algorithms to model data, which can be useful for building data pipelines for machine learning applications.
Business Analyst
A Business Analyst uses data to analyze business problems and make recommendations. This course can be helpful for a Business Analyst who wants to work with big data. The course will teach you how to use Spark, a popular tool for working with big data, and how to use it to acquire, process, and analyze big datasets. In addition, you will learn how to use statistical and machine learning algorithms to model data, which can be useful for building data-driven business recommendations.
Data Researcher
A Data Researcher uses data to conduct research and develop new knowledge. This course can be helpful for a Data Researcher who wants to work with big data. The course will teach you how to use Spark, a popular tool for working with big data, and how to use it to conduct research. In addition, you will learn how to use statistical and machine learning algorithms to model data, which can be useful for drawing meaningful conclusions from data.
Quantitative Analyst
A Quantitative Analyst uses mathematical and statistical methods to analyze data and make predictions. This course can be helpful for a Quantitative Analyst who wants to work with big data. The course will teach you how to use Spark, a popular tool for working with big data, and how to use it to analyze big datasets. In addition, you will learn how to use statistical and machine learning algorithms to model data and make predictions.
Statistician
A Statistician collects, analyzes, and interprets data. This course may be helpful for a Statistician who wants to work with big data. The course will teach you how to use Spark, a popular tool for working with big data, and how to use it to analyze big datasets. In addition, you will learn how to use statistical and machine learning algorithms to model data, which is essential for drawing meaningful conclusions from data.
Database Administrator
A Database Administrator designs, builds, and maintains databases. This course can be helpful for a Database Administrator who wants to work with big data. The course will teach you how to use Spark, a popular tool for working with big data, and how to use it to design and build databases. In addition, you will learn how to use statistical and machine learning algorithms to model data, which can be useful for building scalable and efficient databases.
Data Architect
A Data Architect designs and builds data systems. This course can be helpful for a Data Architect who wants to work with big data. The course will teach you how to use Spark, a popular tool for working with big data, and how to use it to design and build data systems. In addition, you will learn how to use statistical and machine learning algorithms to model data, which can be useful for building scalable and efficient data systems.
Software Engineer
A Software Engineer designs, develops, and maintains software applications. This course may be helpful for a Software Engineer who wants to work with big data. The course will teach you how to use Spark, a popular tool for working with big data, and how to use it to develop software applications. In addition, you will learn how to use statistical and machine learning algorithms to model data, which can be useful for building data-driven software applications.
Operations Research Analyst
An Operations Research Analyst uses data to analyze and solve business problems. This course may be helpful for an Operations Research Analyst who wants to work with big data. The course will teach you how to use Spark, a popular tool for working with big data, and how to use it to analyze business data. In addition, you will learn how to use statistical and machine learning algorithms to model data, which can be useful for building data-driven solutions to business problems.
Financial Analyst
A Financial Analyst uses data to make investment decisions. This course may be helpful for a Financial Analyst who wants to work with big data. The course will teach you how to use Spark, a popular tool for working with big data, and how to use it to analyze financial data. In addition, you will learn how to use statistical and machine learning algorithms to model data, which can be useful for making informed investment decisions.
Marketing Analyst
A Marketing Analyst uses data to analyze marketing campaigns and make recommendations. This course may be helpful for a Marketing Analyst who wants to work with big data. The course will teach you how to use Spark, a popular tool for working with big data, and how to use it to analyze marketing data. In addition, you will learn how to use statistical and machine learning algorithms to model data, which can be useful for building data-driven marketing campaigns.
Risk Analyst
A Risk Analyst uses data to analyze and manage risk. This course may be helpful for a Risk Analyst who wants to work with big data. The course will teach you how to use Spark, a popular tool for working with big data, and how to use it to analyze risk data. In addition, you will learn how to use statistical and machine learning algorithms to model data, which can be useful for building risk models.

Reading list

We've selected ten books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Big Data Analytics Using Spark.
Is the definitive guide to Apache Spark. It covers everything from the basics of Spark to advanced topics such as machine learning and graph processing. It must-read for anyone who wants to use Spark for big data analysis.
Provides a comprehensive overview of Apache Spark, including its architecture, programming model, and use cases. It valuable resource for anyone who wants to learn more about Spark and how to use it for big data analysis.
Provides a comprehensive overview of reinforcement learning, including its challenges and opportunities. It also covers best practices for designing and implementing reinforcement learning solutions.
Provides a practical introduction to machine learning with Apache Spark. It covers the basics of machine learning, and it shows how to use Spark to build and train machine learning models.
Provides a comprehensive overview of big data analytics, including its challenges and opportunities. It also covers best practices for designing and implementing big data analytics solutions.
Provides a comprehensive overview of data mining, including its challenges and opportunities. It also covers best practices for designing and implementing data mining solutions.
Provides a comprehensive overview of deep learning, including its challenges and opportunities. It also covers best practices for designing and implementing deep learning solutions.
Provides a practical introduction to deep learning with Python. It covers the basics of deep learning, and it shows how to use Python to build and train deep learning models.
Provides a practical introduction to data science for business professionals. It covers the basics of data mining and data-analytic thinking, and it shows how to use these techniques to solve business problems.
Provides a practical introduction to predictive analytics. It covers the basics of predictive analytics, and it shows how to use predictive analytics to solve business problems.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Big Data Analytics Using Spark.
Big Data Fundamentals
Most relevant
Apache Spark with Scala - Hands On with Big Data!
Most relevant
Introduction to Big Data with Spark and Hadoop
Most relevant
Apache Spark 2.0 with Java -Learn Spark from a Big Data...
Most relevant
Big Data, Hadoop, and Spark Basics
Most relevant
Big Data Essentials: HDFS, MapReduce and Spark RDD
Most relevant
Big Data Computing with Spark
Most relevant
Developing Spark Applications Using Scala & Cloudera
Most relevant
Apache Spark Fundamentals
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser