Save for later

Big Data Analysis with Scala and Spark

Functional Programming in Scala,

Manipulating big data distributed over a cluster using functional concepts is rampant in industry, and is arguably one of the first widespread industrial uses of functional ideas. This is evidenced by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework written in Scala. In this course, we'll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout. We'll cover Spark's programming model in detail, being careful to understand how and when it differs from familiar programming models, like shared-memory parallel collections or sequential Scala collections. Through hands-on examples in Spark and Scala, we'll learn when important issues related to distribution like latency and network communication should be considered and how they can be addressed effectively for improved performance. Learning Outcomes. By the end of this course you will be able to: - read data from persistent storage and load it into Apache Spark, - manipulate data with Spark and Scala, - express algorithms for data analysis in a functional style, - recognize how to avoid shuffles and recomputation in Spark, Recommended background: You should have at least one year programming experience. Proficiency with Java or C# is ideal, but experience with other languages such as C/C++, Python, Javascript or Ruby is also sufficient. You should have some familiarity using the command line. This course is intended to be taken after Parallel Programming:

Get Details and Enroll Now

OpenCourser is an affiliate partner of Coursera and may earn a commission when you buy through our links.

Get a Reminder

Send to:
Rating 4.6 based on 415 ratings
Length 5 weeks
Starts Mar 8 (11 weeks ago)
Cost $79
From École polytechnique fédérale de Lausanne via Coursera
Instructors Dr. Heather Miller, Prof. Heather Miller
Download Videos On all desktop and mobile devices
Language English
Subjects Programming
Tags Computer Science Algorithms

Get a Reminder

Send to:

Similar Courses

What people are saying

introduction to spark

excellent introduction to Spark.

Great introduction to Spark accessed through Scala.

Great introduction to Spark.

Concepts covered here are very helpful though and it is a useful introduction to Spark.

I enjoyed the exercises and thought it gave a good introduction to Spark.

A great introduction to Spark !!!

Good introduction to Spark.

This course is a great introduction to Spark.

good stuff, compared to the other similar course in PySpark this one gave me a lot more understanding of how things work in Spark on the low level Generally a really good introduction to Spark.

Great introduction to spark.

Great introduction to Spark and it's data structures.

bon cours I loved this course - it was a great introduction to Spark.

Very good introduction to spark.

Great Introduction to spark.

Read more

very good course

Super course, well done Heather It is indeed a very good course, but 2nd assignment was tough though.

Very good course!

Very good course, it is a must for anyone who is starting in spark using scala, thanks a lot, it did really help me Nice introduction into Spark with details about how Spark works internally.

very good course, really enjoyed The sessions where clearly explained and focused.

Very good course, but it needs more details and examples.

Kudos to Professor Miller, we love you :-) Very good course.

Very good course for a great start in Spark.

Very good course for learning and practicing Spark.

Very Good Course For College students who completed and wants to start professionally.

Read more

big data

it was a super interesting course Dear Heather,your course on big data with scala is the very first online course I participate in.I enjoy the way you explain the material and receive a real aesthetic pleasure.

Very good for Scala beginners and students who are entering the world of Big Data The material of the fourth week is quite dense, this could be split over two weeks (including splitting it into two exercises).

you cannot complete them just by following the course material, forcing you to waste quite a lot of time either: (1) learning from other sources; (2) looking for answers on the forum; or (3) brute forcing an answer till rage quitting :)another bad point: the course is supposed to be focused on spark & big data analysis but it has 1-2 lectures (around 40-60 mins) pretty much devoted to showing some SQL.

It is not a general Big Data course, neither is it an easy one.

The course gave me insight into the world och big data batch processing and how Spark solves it.

goot as introduction about spark and big data.

Todos los conocimientos obtenidos me serán de mucha ayuda en mi camino hacia el mundo de Big Data good but give more practical of small program The instructor is great as well as the material.

Great course about Big Data analysis It was my first exposure to Big Data frameworks and I learned a lot about the problems trying to be solved and the power of Spark.

Very well explained, a very well teacher Helpful for anyoe who wants to start with basics of Spark The lecture is well-organizedand excellent Best Course for Big Data Learning in the World really good material, well explained with many examples.maybe more information or precisions should be added to the assignments but good material and explanations Liked the course.

This course allows me to learn so many things about data analysys and Big data modeling.

this is an outstanding course to learn BIg data with scala and spark Incredible tutorial!!!!!!!!!!

Read more

spark with scala

Very complet and accurate about Spark with Scala.

I think a specialization on applications of spark with scala covering AI, graph and text processing would be interesting.

A nice course to start with learning basics of Spark with Scala, however it has missing things like broadcast variables, what are tasks/executors in Spark etc.

It is very good course material for Spark with scala.

As always, Coursera delivered another top quality courses on Spark with Scala.

The exercises are a little off topic Great introductory course on Spark with Scala.

Read more

well explained

It was an awesome and well explained course.

Concepts are very well explained.. this course help me form a basic understanding of Spark and how to use it to analyze large scale dataset.

Really well explained and planned.

I really liked the assignments in this course and all the content was well explained.

excellent quality of content Great subject, well explained with solid weekly assignments make this course a stellar learning experience.

Great course, well explained, instant value!

the theory is very clear and well explained.the practical assignments are a little bit ambiguous but they are overall very good and challenging.

Read more

really enjoyed

I really enjoyed the course, specially the first 3 weeks.

Really enjoyed this course.

I really enjoyed going through the course, and I learned a lot.

I learned a lot and I really enjoyed the course.

I really enjoyed coding the assignments.

I really enjoyed this course!

Great course, I really enjoyed learning!

Read more

recommend this course

Still, I would recommend this course.

Due to this course I understood basic spark concepts well enough to begin understanding pipelines built on top of spark such as ADAM.I would highly recommend this course to everyone.

I highly recommend this course!

The tasks were fun and like those you would find in the wild.I already applied some of the skills I learned here at work and successfully implemented a simple recommendation engine that will go to production next week.Highly recommend this course!

Read more

dr. miller

Thank you EPFL, Dr. Miller and Coursera for providing such opportunity for me.

Great Course, thanks Dr. Miller's lectures are clear and concise.

Dr. Miller apparently did a very good job.

Thanks to Dr. Miller for such a great course.

It was amazing how Dr. Miller used concepts that were meticulously built up in the earlier courses, such as evaluation strategy, functional collections, reactive programming, and associativity, to describe the core of Spark in only four units.

Thank you Dr. Miller course needs to be updated really great course, it help me fast get into the new area.

Read more

heather miller

Thank you Dr. Heather Miller and the EPFL team along with coursera team for this course.

Thanks Heather Miller for such cool class !!

Many Thanks to the course instructor Heather Miller for creating a very detail and updated course on Spark.

The subject was not covered to the same level of detail that the other subjects in the course were given.These are minor points.Well done, Dr Heather Miller!

Excellent explanations by Heather Miller.

Read more

video lectures

awesome course content It was a pleasure to follow the video lectures and solve the assignments.

Awesome video lectures by the instructor.

Accompanied with awful practice lessons: - code templates are written with little to no style, even file reading is done in 3 different ways in all 3 lessons; - grader output is very confusing and almost useless; - unit tests, very useful to avoid some common caveats, were present in the first lesson, disappear completely in the last one.Probably following spark's programming guide is better time investment, even if it misses some "humanity" of video lectures Amazing lectures, and challenging tasks to do on the way.

Course Assignments consumed more time than anticipated, as they required the knowledge from upcoming week's video lectures.

Excellent such a beautiful course design for a bigData devlopers The video lectures are good but code assignments are worse, seems like they were written by students instead of professor or something.

I'd done some prior work with Hadoop/Pig in the past and more recently with Spark (mainly DataFrames/GraphFrames) - this was really useful to round out my understanding of RDDs and optimisation.The assignment guidance in the code comments could be more complete to save having to refer back to the site (and maybe reference specific video lectures with the hints).

Read more

insight into

Especially the exercise on PCA didn't really seem to provide that much insight into the data or illustrate the usefulness of the algorithm (especially when compared to the parallel programming exercise which had a great use for PCA).

Great, short course, which gives great insight into Spark and ad-hoc data processing on Hadoop-ish clusters.

Very good introduction to RDDs and DataFrames/Dataset along with valuable insight into performance considerations.

Read more

previous courses

Unlike previous courses where I had to wrestle with algorithms and only learn the subject as a side effect, the assignments in this course directly addressed the subject.

It's sometime hard to keep my concentration (compared to previous courses of the specialization).

The previous courses had difficult assignments and you had to think about how you wanted to do something, here the problem was using the Spark API and understanding things which weren't explained in the lectures.

The exercises were below the standard of previous courses.

Read more


An overview of related careers and their average salaries in the US. Bars indicate income percentile.

Volunteer Big Data Engineer $48k

Informatica PowerCenter with Big Data $69k

Oracle Big Data Appliance $76k

Corporate Technology- Scala/Spark/Hadoop Engineer $76k

Big data developer with AWS $78k

Senior Big Data Engineer 2 $93k

Big Data Architect Consultant $132k

Big Data Specialist $149k

Big Data Practice Architect $162k

Big Data Architect Lead $177k

Principal Big Data Architect $180k

Big Data Enterprise Architect $202k


Sorted by most helpful reviews first

Guest says:

This is a nice introduction to Spark. It's worth noting that even though it's part of a "Specialization" you can take this course individually if you're already familiar with Scala, which is what I did. Otherwise I imagine the other courses in this series are excellent too.

Write a review

Your opinion matters. Tell us what you think.

Rating 4.6 based on 415 ratings
Length 5 weeks
Starts Mar 8 (11 weeks ago)
Cost $79
From École polytechnique fédérale de Lausanne via Coursera
Instructors Dr. Heather Miller, Prof. Heather Miller
Download Videos On all desktop and mobile devices
Language English
Subjects Programming
Tags Computer Science Algorithms

Similar Courses

Sorted by relevance

Like this course?

Here's what to do next:

  • Save this course for later
  • Get more details from the course provider
  • Enroll in this course
Enroll Now