Big Data Analytics Using Spark
In data science, data is called "big" if it cannot fit into the memory of a single standard laptop or workstation.
The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark.
In this course, part of the Data Science MicroMasters program, you will learn what the bottlenecks are in massive parallel computation and how to use spark to minimize these bottlenecks.
You will learn how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib).
In this course, as in the other ones in this MicroMasters program, you will gain hands-on experience using PySpark within the Jupyter notebooks environment.
What you'll learn
- Programming Spark using Pyspark
- Identifying the computational tradeoffs in a Spark application
- Performing data loading and cleaning using Spark and Parquet
- Modeling data through statistical and machine learning methods
Get a Reminder
Rating | 1.0★ based on 1 ratings |
---|---|
Length | 10 weeks |
Effort | 10 weeks, 9–12 hours per week |
Starts | On Demand (Start anytime) |
Cost | $350 |
From | The University of California San Diego, UC San DiegoX, UCSanDiegoX, The University of California, San Diego via edX |
Instructor | Yoav Freund |
Download Videos | On all desktop and mobile devices |
Language | English |
Subjects | Programming Data Science |
Tags | Computer Science Data Analysis & Statistics |
Get a Reminder
Similar Courses
What people are saying
course staff gives
The content is ok, but there are to much problems to submit the assigments, and nobody from the Course staff gives you any answers on time.
answers on time
content is ok
do not recomend
Do not recomend to pay for it.
also complain
I also complain to Ed, but not answer at all
answer at
any answers
much problems
nobody from
pay for
Careers
An overview of related careers and their average salaries in the US. Bars indicate income percentile.
AD, Data Science $47k
Associate Data Science Supervisor $55k
Science writer / data analyst $63k
Genomic Data Science Programmer $75k
Volunteer Director of Data Science $78k
Expert Data Science Supervisor $79k
Supervisor 1 Data Science Supervisor $91k
Guest Director of Data Science $101k
Data Science Architect $105k
Head of Data Science $131k
Assistant Director 1 of Data Science $133k
Owner Director of Data Science $149k
Write a review
Your opinion matters. Tell us what you think.
Please login to leave a review
Rating | 1.0★ based on 1 ratings |
---|---|
Length | 10 weeks |
Effort | 10 weeks, 9–12 hours per week |
Starts | On Demand (Start anytime) |
Cost | $350 |
From | The University of California San Diego, UC San DiegoX, UCSanDiegoX, The University of California, San Diego via edX |
Instructor | Yoav Freund |
Download Videos | On all desktop and mobile devices |
Language | English |
Subjects | Programming Data Science |
Tags | Computer Science Data Analysis & Statistics |
Similar Courses
Sorted by relevance
Like this course?
Here's what to do next:
- Save this course for later
- Get more details from the course provider
- Enroll in this course