Save for later

Big Data Analytics Using Spark

Data Science,

In data science, data is called "big" if it cannot fit into the memory of a single standard laptop or workstation.

The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark.

In this course, part of the Data Science MicroMasters program, you will learn what the bottlenecks are in massive parallel computation and how to use spark to minimize these bottlenecks.

You will learn how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib).

In this course, as in the other ones in this MicroMasters program, you will gain hands-on experience using PySpark within the Jupyter notebooks environment.

What you'll learn

  • Programming Spark using Pyspark
  • Identifying the computational tradeoffs in a Spark application
  • Performing data loading and cleaning using Spark and Parquet
  • Modeling data through statistical and machine learning methods

Get Details and Enroll Now

OpenCourser is an affiliate partner of edX and may earn a commission when you buy through our links.

Get a Reminder

Send to:
Rating 1.0 based on 1 ratings
Length 10 weeks
Effort 10 weeks, 9–12 hours per week
Starts On Demand (Start anytime)
Cost $350
From The University of California San Diego, UC San DiegoX, UCSanDiegoX, The University of California, San Diego via edX
Instructor Yoav Freund
Download Videos On all desktop and mobile devices
Language English
Subjects Programming Data Science
Tags Computer Science Data Analysis & Statistics

Get a Reminder

Send to:

Similar Courses

What people are saying

course staff gives

The content is ok, but there are to much problems to submit the assigments, and nobody from the Course staff gives you any answers on time.

answers on time

content is ok

do not recomend

Do not recomend to pay for it.

also complain

I also complain to Ed, but not answer at all

answer at

any answers

much problems

nobody from

pay for

Careers

An overview of related careers and their average salaries in the US. Bars indicate income percentile.

AD, Data Science $47k

Associate Data Science Supervisor $55k

Science writer / data analyst $63k

Genomic Data Science Programmer $75k

Volunteer Director of Data Science $78k

Expert Data Science Supervisor $79k

Supervisor 1 Data Science Supervisor $91k

Guest Director of Data Science $101k

Data Science Architect $105k

Head of Data Science $131k

Assistant Director 1 of Data Science $133k

Owner Director of Data Science $149k

Write a review

Your opinion matters. Tell us what you think.

Rating 1.0 based on 1 ratings
Length 10 weeks
Effort 10 weeks, 9–12 hours per week
Starts On Demand (Start anytime)
Cost $350
From The University of California San Diego, UC San DiegoX, UCSanDiegoX, The University of California, San Diego via edX
Instructor Yoav Freund
Download Videos On all desktop and mobile devices
Language English
Subjects Programming Data Science
Tags Computer Science Data Analysis & Statistics

Similar Courses

Sorted by relevance

Like this course?

Here's what to do next:

  • Save this course for later
  • Get more details from the course provider
  • Enroll in this course
Enroll Now