Distributed Machine Learning with Apache Spark

Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability and optimization. Learning algorithms enable a wide range of applications, from everyday tasks such as product recommendations and spam filtering to bleeding edge applications like self-driving cars and personalized medicine. In the age of ‘big data’, with datasets rapidly growing in size and complexity and cloud computing becoming more pervasive, machine learning techniques are fast becoming a core component of large-scale data processing pipelines.

This statistics and data analysis course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including exploratory data analysis, feature extraction, supervised learning, and model evaluation. You will gain hands-on experience applying these principles using Spark, a cluster computing system well-suited for large-scale machine learning tasks, and its packages spark.ml and spark.mllib. You will implement distributed algorithms for fundamental statistical models (linear regression, logistic regression, principal component analysis) while tackling key problems from domains such as online advertising and cognitive neuroscience.

Get Details and Enroll Now

OpenCourser is an affiliate partner of edX.

Get a Reminder

Not ready to enroll yet? We'll send you an email reminder for this course

Send to:

edX

&

Berkeley

Rating Not enough ratings
Length 4 weeks
Effort 5-10 hours per week
Starts On Demand (None)
Cost $0
From Berkeley via edX
Instructors Ameet Talwalkar, Jon Bates
Free Limited Content
Language English
Subjects Programming Data Science
Tags Computer Science Data Analysis & Statistics

Get a Reminder

Get an email reminder about this course

Send to:

Careers

An overview of related careers and their average salaries in the US. Bars indicate income percentile.

Research Scientist-Machine Learning $55k

Cloud Architect - Azure / Machine Learning $75k

Watson Machine Learning Engineer $81k

Machine Learning Software Developer $103k

Software Engineer (Machine Learning) $116k

Applied Scientist, Machine Learning $130k

Autonomy and Machine Learning Solutions Architect $131k

Applied Scientist - Machine Learning -... $136k

RESEARCH SCIENTIST (MACHINE LEARNING) $147k

Machine Learning Engineer 2 $161k

Machine Learning Scientist Manager $170k

Machine Learning Scientist, Personalization $213k

Write a review

Your opinion matters. Tell us what you think.

edX

&

Berkeley

Rating Not enough ratings
Length 4 weeks
Effort 5-10 hours per week
Starts On Demand (None)
Cost $0
From Berkeley via edX
Instructors Ameet Talwalkar, Jon Bates
Free Limited Content
Language English
Subjects Programming Data Science
Tags Computer Science Data Analysis & Statistics