Save for later

Big Data Analysis with Scala and Spark

This course is a part of Functional Programming in Scala, a 5-course Specialization series from Coursera.

Manipulating big data distributed over a cluster using functional concepts is rampant in industry, and is arguably one of the first widespread industrial uses of functional ideas. This is evidenced by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework written in Scala. In this course, we'll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout. We'll cover Spark's programming model in detail, being careful to understand how and when it differs from familiar programming models, like shared-memory parallel collections or sequential Scala collections. Through hands-on examples in Spark and Scala, we'll learn when important issues related to distribution like latency and network communication should be considered and how they can be addressed effectively for improved performance. Learning Outcomes. By the end of this course you will be able to: - read data from persistent storage and load it into Apache Spark, - manipulate data with Spark and Scala, - express algorithms for data analysis in a functional style, - recognize how to avoid shuffles and recomputation in Spark, Recommended background: You should have at least one year programming experience. Proficiency with Java or C# is ideal, but experience with other languages such as C/C++, Python, Javascript or Ruby is also sufficient. You should have some familiarity using the command line. This course is intended to be taken after Parallel Programming: https://www.coursera.org/learn/parprog1.

Get Details and Enroll Now

OpenCourser is an affiliate partner of Coursera.

Set Reminder Save for later

Get a Reminder

Not ready to enroll yet? We'll send you an email reminder for this course

Send to:

Coursera

&

École polytechnique fédérale de Lausanne

Rating 4.6 based on 368 ratings
Length 5 weeks
Starts Sep 23 (3 weeks ago)
Cost $79
From École polytechnique fédérale de Lausanne via Coursera
Instructors Dr. Heather Miller, Prof. Heather Miller
Download Videos On all desktop and mobile devices
Language English
Subjects Programming
Tags Computer Science Algorithms

Get a Reminder

Get an email reminder about this course

Send to:

What people are saying

According to other learners, here's what you need to know

introduction to spark in 14 reviews

excellent introduction to Spark.

Great introduction to Spark accessed through Scala.

Great introduction to Spark.

Concepts covered here are very helpful though and it is a useful introduction to Spark.

I enjoyed the exercises and thought it gave a good introduction to Spark.

A great introduction to Spark !!!

Good introduction to Spark.

Read more

very good course in 10 reviews

Super course, well done Heather It is indeed a very good course, but 2nd assignment was tough though.

Very good course!

Very good course, it is a must for anyone who is starting in spark using scala, thanks a lot, it did really help me Nice introduction into Spark with details about how Spark works internally.

very good course, really enjoyed The sessions where clearly explained and focused.

Very good course, but it needs more details and examples.

Kudos to Professor Miller, we love you :-) Very good course.

Very good course for a great start in Spark.

Read more

big data in 10 reviews

it was a super interesting course Dear Heather,your course on big data with scala is the very first online course I participate in.I enjoy the way you explain the material and receive a real aesthetic pleasure.

Very good for Scala beginners and students who are entering the world of Big Data The material of the fourth week is quite dense, this could be split over two weeks (including splitting it into two exercises).

you cannot complete them just by following the course material, forcing you to waste quite a lot of time either: (1) learning from other sources; (2) looking for answers on the forum; or (3) brute forcing an answer till rage quitting :)another bad point: the course is supposed to be focused on spark & big data analysis but it has 1-2 lectures (around 40-60 mins) pretty much devoted to showing some SQL.

It is not a general Big Data course, neither is it an easy one.

The course gave me insight into the world och big data batch processing and how Spark solves it.

goot as introduction about spark and big data.

Todos los conocimientos obtenidos me serán de mucha ayuda en mi camino hacia el mundo de Big Data good but give more practical of small program The instructor is great as well as the material.

Read more

really enjoyed in 9 reviews

I really enjoyed the course, specially the first 3 weeks.

Really enjoyed this course.

I really enjoyed going through the course, and I learned a lot.

I learned a lot and I really enjoyed the course.

I really enjoyed coding the assignments.

I really enjoyed this course!

Great course, I really enjoyed learning!

Read more

spark with scala in 7 reviews

Very complet and accurate about Spark with Scala.

I think a specialization on applications of spark with scala covering AI, graph and text processing would be interesting.

A nice course to start with learning basics of Spark with Scala, however it has missing things like broadcast variables, what are tasks/executors in Spark etc.

It is very good course material for Spark with scala.

As always, Coursera delivered another top quality courses on Spark with Scala.

The exercises are a little off topic Great introductory course on Spark with Scala.

Read more

dr. miller in 7 reviews

Thank you EPFL, Dr. Miller and Coursera for providing such opportunity for me.

Great Course, thanks Dr. Miller's lectures are clear and concise.

Dr. Miller apparently did a very good job.

Thanks to Dr. Miller for such a great course.

It was amazing how Dr. Miller used concepts that were meticulously built up in the earlier courses, such as evaluation strategy, functional collections, reactive programming, and associativity, to describe the core of Spark in only four units.

Thank you Dr. Miller course needs to be updated really great course, it help me fast get into the new area.

Read more

Careers

An overview of related careers and their average salaries in the US. Bars indicate income percentile.

Volunteer Big Data Engineer $48k

Informatica PowerCenter with Big Data $69k

Oracle Big Data Appliance $76k

Corporate Technology- Scala/Spark/Hadoop Engineer $76k

Big data developer with AWS $78k

Senior Big Data Engineer 2 $93k

Big Data Architect Consultant $132k

Big Data Specialist $149k

Big Data Practice Architect $162k

Big Data Architect Lead $177k

Principal Big Data Architect $180k

Big Data Enterprise Architect $202k

Reviews

Sorted by most helpful reviews first

Guest says:

This is a nice introduction to Spark. It's worth noting that even though it's part of a "Specialization" you can take this course individually if you're already familiar with Scala, which is what I did. Otherwise I imagine the other courses in this series are excellent too.

Write a review

Your opinion matters. Tell us what you think.

Coursera

&

École polytechnique fédérale de Lausanne

Rating 4.6 based on 368 ratings
Length 5 weeks
Starts Sep 23 (3 weeks ago)
Cost $79
From École polytechnique fédérale de Lausanne via Coursera
Instructors Dr. Heather Miller, Prof. Heather Miller
Download Videos On all desktop and mobile devices
Language English
Subjects Programming
Tags Computer Science Algorithms

Similar Courses

Sorted by relevance

Like this course?

Here's what to do next:

  • Save this course for later
  • Get more details from the course provider
  • Enroll in this course
Enroll Now