Save

Big Data Analysis with Apache Spark

Organizations use their data to support and influence decisions and build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term ‘data science’.
This statistics and data analysis course will attempt to articulate the expected output of data scientists and then teach students how to use PySpark (part of Spark) to deliver against these expectations. The course assignments include log mining, textual entity recognition, and collaborative filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.
This course covers advanced undergraduate-level material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), and previous experience with Spark equivalent to Introduction to Apache Spark, is required.

Get Details Enroll Now

OpenCourser is an affiliate partner of edX.

edX

&

Berkeley

Rating Not enough ratings
Length 4 weeks
Effort 5-10 hours per week
Starts On Demand (None)
Cost $0
From Berkeley via edX
Instructor Anthony D. Joseph
Free Limited Content
Language English
Subjects Programming Data Science
Tags Computer Science Data Analysis & Statistics

Create an Alert

Not ready to take this course yet? Sign up for a one-time email reminder with a link to this course.

Careers

An overview of related careers and their average salaries in the US. Bars indicate income percentile (33rd - 99th).

Assistant Big Data Engineer $40k

Talend - Big Data $72k

Big Data Engineer 1 $77k

Big Data Program $94k

Big Data Analyst $95k

Big Data Engineer $97k

Big Data Trainer $100k

Big Data Engineer 2 $112k

Big Data Architect $121k

Senior Sales Engineer - Big Data $146k

Big Data Architect/Engineer $162k

Big Data Solutions Engineer $169k

Write a review

Your opinion matters. Tell us what you think.

edX

&

Berkeley

Rating Not enough ratings
Length 4 weeks
Effort 5-10 hours per week
Starts On Demand (None)
Cost $0
From Berkeley via edX
Instructor Anthony D. Joseph
Free Limited Content
Language English
Subjects Programming Data Science
Tags Computer Science Data Analysis & Statistics