Save for later

Apache Spark with Scala - Hands On with Big Data!

New. Completely updated and re-recorded for Spark 3, IntelliJ, Structured Streaming, and a stronger focus on the DataSet API.

“Big data" analysis is a hot and highly valuable skill – and this course will teach you the hottest technology in big data: Apache Spark. Employers including Amazon, EBay, NASA JPL, and Yahoo all use Spark to quickly extract meaning from massive data sets across a fault-tolerant Hadoop cluster. You'll learn those same techniques, using your own Windows system right at home. It's easier than you might think, and you'll be learning from an ex-engineer and senior manager from Amazon and IMDb.

Spark works best when using the Scala programming language, and this course includes a crash-course in Scala to get you up to speed quickly. For those more familiar with Python however, a Python version of this class is also available: "Taming Big Data with Apache Spark and Python - Hands On".

Learn and master the art of framing data analysis problems as Spark problems through over 20 hands-on examples, and then scale them up to run on cloud computing services in this course.

  • Learn the concepts of Spark's Resilient Distributed Datasets, DataFrames, and Datasets.

  • Get a crash course in the Scala programming language

  • Develop and run Spark jobs quickly using Scala, IntelliJ, and SBT

  • Translate complex analysis problems into iterative or multi-stage Spark scripts

  • Scale up to larger data sets using Amazon's Elastic MapReduce service

  • Understand how Hadoop YARN distributes Spark across computing clusters

  • Practice using other Spark technologies, like Spark SQL, DataFrames, DataSets, Spark Streaming, Machine Learning, and GraphX

By the end of this course, you'll be running code that analyzes gigabytes worth of information – in the cloud – in a matter of minutes. 

We'll have some fun along the way. You'll get warmed up with some simple examples of using Spark to analyze movie ratings data and text in a book. Once you've got the basics under your belt, we'll move to some more complex and interesting tasks. We'll use a million movie ratings to find movies that are similar to each other, and you might even discover some new movies you might like in the process. We'll analyze a social graph of superheroes, and learn who the most “popular" superhero is – and develop a system to find “degrees of separation" between superheroes. Are all Marvel superheroes within a few degrees of being connected to SpiderMan? You'll find the answer.

This course is very hands-on; you'll spend most of your time following along with the instructor as we write, analyze, and run real code together – both on your own system, and in the cloud using Amazon's Elastic MapReduce service. over 8 hours of video content is included, with over 20 real examples of increasing complexity you can build, run and study yourself. Move through them at your own pace, on your own schedule. The course wraps up with an overview of other Spark-based technologies, including Spark SQL, Spark Streaming, and GraphX.

Enroll now, and enjoy the course.

"I studied Spark for the first time using Frank's course "Apache Spark 2 with Scala - Hands On with Big Data. ". It was a great starting point for me,  gaining knowledge in Scala and most importantly practical examples of Spark applications. It gave me an understanding of all the relevant Spark core concepts,  RDDs, Dataframes & Datasets, Spark Streaming, AWS EMR. Within a few months of completion, I used the knowledge gained from the course to propose in my current company to  work primarily on Spark applications. Since then I have continued to work with Spark. I would highly recommend any of Franks courses as he simplifies concepts well and his teaching manner is easy to follow and continue with.   " - Joey Faherty

Get Details and Enroll Now

OpenCourser is an affiliate partner of Udemy.

Get a Reminder

Send to:
Rating 4.4 based on 1,702 ratings
Length 9 total hours
Starts On Demand (Start anytime)
Cost $12
From Udemy
Instructors Sundog Education by Frank Kane, Frank Kane
Download Videos Only via the Udemy mobile app
Language English
Subjects Data Science Business
Tags Data Science Business Development Data & Analytics

Get a Reminder

Send to:

Similar Courses

What people are saying

easy to follow

His English is perfect, it was really easy to follow up with the course and make everything so clear.

Here's what I consider to be the positives of the course: pros -well-structured as an intro course(for example, showing hard way (ex: map/reduce) then easy way (ex: DataSets) to do some things) -Videos and instructions are easy to follow.

Excellent looks fine The instructor knows the subject very well, and he presents each subject clearly and made each of the class easy to follow.

Excellent Course nice one Complete and useful, easy to follow.

The course is perfect - clean, straight and easy to follow.

Good teach, great examples Easy to follow, nice to watch.

the steps are simple and easy to follow.

Trying it yourself helps to lock in the information easy to follow steps, thank you!

Excellent course for a good introduction to Spark with a lot of easy to follow hands-on exercises.

Perfect course if you want to start with Scala and Spark just so easy to follow!!!

Easy to follow.

Thorough coverage of Spark with easy to follow examples.

The presenter is dynamic and direct, the lectures are easy to follow and direct.

He delivers this content at a steady pace, that is easy to follow, and pretty easy to digest.

Read more

big data

good information...first time learning about Apache spark........platform is very impressive for what tools in big data that it can accomplish.

Frank is always quite precise on big data topics.

I've learning a bunch of stuff about Scala and big data.

I would highly recommend this course to anyone who wants to switch their career in the world of Big data.

Coming from the AWS big data specialty course, and I like this teacher very much.

I'm working in a Big Data department for some months and I already knew some lessons, but you can always learn something new.

Additionally, it would be helpful to have a bit more content orienting folks to the wider big data universe.

Went into enough detail that I could begin working on a contract at a company just starting to get involved in Big Data and I was able to help them out.

I am from database background and wanted to put my steps in big data world and I found this course nicely organized which is very helpful for a beginner like me who came from a different background of IT industry.

Provides a really nice understanding of how spark works, especially with RDDs manipulation which seems to be a more flexible but not so trivial approach when dealing with big data.

Gives a ton of confidence to get started and do wonderful things on Big data.

Frank Kane is an excellent instructor, provides a clear and comprehensive explanation to all topics and hands-on exercises, definitely this course will be a reference material for my upcoming big data projects.

Excellent course.The only thing I had to pause and refer back is to understand the big picture in Big data and how all the various components connect to each other (obviously it is a different topic) but a small overview would have definitely helped.

Thank you for this awesome experience for the new guys to big data like me.

Read more

step by step

It is helpful but I use macOS This was an excellent intro to the course, very good to follow step by step the installation process, explaining everything and walking through the first scala job!

This course deserves 5 stars in my opinion and so I rated it :) Great course for starting spark with scala I already tried what they told me to change the scala versions and I keep getting this in the exercises "Error: the main class was not found or loaded"I already reviewed step by step and apparently it is well in addition to that the instructor makes the videos very repetitive besides that they have not answered me satisfactorily what to do with the problem that is not found or loaded the main console and does not do exercises in which we see how perform the program in a more practical way The course walks you through al Spark and associated libraries y a very clear way.

It was very detailed and step by step instructions..

The course needs to be updated to latest formats absolute perfect step by step guide for those looking to learn Spark with Scala.

very good and step by step instruction to install the required things Excellent boot camp on Scala.

Various examples and step by step instructions to solve different problems will diffidently put you on the best track to start your big data career or even a new page or stage of your career.

Very organized, step by step topics with sufficient examples to get the feel of the language and Spark environment.

Very nice presentation, good explanation (step by step).

It energizes you while following step by step Frank is an excellent teacher and his knowledge on the subject deep.

Very detailed explanation of the concepts with step by step instructions along with loads of examples to try.

:) Excellent course on the subject Good Learning step by step guidance was provided Easy to follow the course.

Good step by step Installation process explanation to understand easily.

Very clear and step by step instruction to follow.

Step by step information is very clear.

Read more

so far so good

This course has made me more confident and comfortable with scala and Spark for my job So far so good =) The presenter is very well organized and to the point.

So far so good.

So far So Good!

I loved it So far so good It goes at a nice pace so far.

So far so good I've taken some of Franks classes before and I highly recommend them.

So far so good.. .hoping same for future lectures as well.

Excellent course Instructions to setup scala and running the first sample app is very clear Good Great Really thourough instructions, but not boring for somebody with programming experience 6 Very straightforward So far so good overview for software installation is very nice.

missing explanation and examples So far so good.

Lecture was good and helpful So far so good!

Yes Yes, so far so good good it was gooooooooooood I'm enjoying it.

Read more

real world

It's clear to me that person who wrote this code didn't have any real world experience with production code and working in a team of Scala developers.

Hope use it to real world problems!!!

-One betterment area would be still more real world examples.

It will be better if provide some scenarios/use cases of the real world projects.

Real world example is the main beauty of this course.

But, request to add/create a course for working in real world projects.

The materials touch upon all the features of spark, while solving real world problems using each one of them.

He really uses a lot of real world examples and applications to build up your skills.

I would recommend his classes to anyone who wishes to quickly gain knowledge about these topics from real world application perspective.

In the real world, I imagine there'll be a lot of data cleansing, data loading, etc issues which haven't been covered in the course yet.

The way the presenter explains real world problems and how analytical data is extracted from it is very convincing.

I would like to get more real world examples.I think this is too basic.

Needs more algorithms analysis tbh In real world, how do i build a Spark System?

i felt like i should be thought whats going on real world.... hadoop on windows is something was not expected here.

Read more

highly recommend

Highly recommended Excellent.

Excellent learning tool with very good sample code examples and exercises...highly recommend I really liked this course.

I highly recommend to anyone.

I highly recommend this.

I highly recommend it!

Highly recommend.

I highly recommend Frank and his courses.

Highly recommended!

Read more

rather than

The only slight criticism is it would be nice to have your course slides as I find it easier to go back over them to find a previous point rather than go through all the media sections to find what I was looking for.

Amazing explanations and hands-on i really like his methodology, very clear Course content are good, but the way code explained, it would be better to tell those computation with some example like on copy and pencil, so that the whole flow be in mind rather than just narrating the code.

Use of for loops rather than map, reduce, folds, etc.

is decidedly not functional and should be avoided rather than encouraged.

Write helper functions for simple cases rather than.

The whole reason for doing an online course rather than just watching videos is the interactive component, and the interactive component in the form of feedback for exercises and questions is absent here.

It spends too much time on RDDs rather than Spark SQL and also the streaming part is almost only dedicated to DStreams instead of structured streaming.

This is due to nature of Scala though (as it's notoriously difficult to learn) & subject matter difficulty rather than issue of the this excellent lecturer.

Covered all the basic things for Spark Very do-able exercises that are focused on how to use Spark rather than forcing the student to spend loads of time thinking about some contrived problem.

I like the material it is enriching but from time to time if there's code writing rather than just looking at a already written code.

Because the major focus here is to work on examples rather than understandin why and how spark should be used.

As someone who is eager to get coding I wish some of the earlier examples were more "code along with me" rather than "read this code", but now that the course is ramping up with coding problems it's a lot more along the lines of what I was expecting.

I'd like to see a more functional style of programming using recursion rather than iteration for traversing graphs.

I personally feel it would have been much better if there had been some graded exercises rather than serving everything on the plate.

Read more

looking forward

And I am looking forward to SparkQL, dataframes and datasets since most of my work is in the data science area.

Looking forward to learning to use this.

Looking forward for more great tutorials from you ..Thanks Covers a lot of useful topics.

Looking forward to more courses :) Very good presentation and materials.

I'm looking forward getting to the other half of the course.

Looking forward to complete the course and do more advance courses.

Looking forward to the next lesson :-) He explained all the concepts well.

Pretty Good examples till now and the way it is being taught is very nice Clear, well explained, great examples and code Excellent teacher The basics of each concept is covered well with easy examples to follow, looking forward for the rest of the topics.

Looking forward to learn SPARK in deep now.

I am looking forward to learning quite a lot in this course.

Read more

for example

For example, the 'Superhero Degrees of Separation' could have been dwelled on much more too was reduced to simple screen reading unfortunately.

For example, I had to google search and actually understand reduceByKey function before I could use it since I was not sure which was the accumulated value and which was the current value.

For example this course not explain some basic concept for programming in scala using OOP.

For example, find 10 most popular superheros, I have learn how to find most popular one, but how get first 10 or last 10 ?

For example, the GraphX video explaining BFS done with Pregel.

For example, shuffling was mentioned several times, but nowhere did Frank take the time to explain the semantics of shuffling, and the considerations thereof.

For example if I read a csv file, is it always distributed?

For example, I have never worked with accumulators nor Map syntax.

The course was awesome...simply awesome... One area of improvement : Show us more examples on how we can use the Spark UI for EMR monitoring...for example...we just touched upon the topic of increasing executor-memory if we get OOM errors..we could have actually shown that and other small examples for EMR cluster section, in action It would have been easy if the author explained the solution in SQL before writes the code in Spark.

Read more

apache spark and scala

The course is so instructive, and you can get a nice initial contact to Apache Spark and Scala programming language.

I would recommend this course to go through the medium to high complex course for Apache Spark and Scala course Liked a lot this course.

It's a great stating point for Apache Spark and Scala.

It is a good point for starting with Apache Spark and Scala I was looking for some more exercises with solutions.

Read more

till now

Till now its gr8 learning very nice!

So great learning till now and i am enjoying it.

there there only 2 up till now.

Very good explanation till now but having few confusion regarding tuples and spark functions.

Also, till now no reference has been made about sbt or maven for build purposes.

Good course till now, i like the exercises given Great Course Great Course !

Till now, i am able to understand about installation.

Learned a lot till now.

Read more

use cases

More general structure about Spark and how to think about different use cases would be helpful, not only 1-2 examples, which seemed random to me.

I really liked the explanation and appropriate examples for each of the use cases.

I found the lack of real-world use cases to also be a bit of a let down.

Starting to see how I can use all this knowledge at problems I have at work due to the interesting use cases presented.

Easy to follow and practical use cases superb course ..and way of explain everything it's great Done Excellent!

Read more


An overview of related careers and their average salaries in the US. Bars indicate income percentile.

Volunteer Big Data Engineer $48k

Data Scientist - Big Data $68k

Big Data and AWS Data Lake $73k

Big Data Developer (Streaming Data) $77k

Big data developer with AWS $78k

Research Scientist Big Data $94k

Big Data Developer Consultant $98k

Big Data Engineer 6 $107k

Big data and ETL specialist $121k

Big Data Specialist $149k

Principal Big Data Architect $180k

Senior Big Data Sales $181k

Write a review

Your opinion matters. Tell us what you think.

Rating 4.4 based on 1,702 ratings
Length 9 total hours
Starts On Demand (Start anytime)
Cost $12
From Udemy
Instructors Sundog Education by Frank Kane, Frank Kane
Download Videos Only via the Udemy mobile app
Language English
Subjects Data Science Business
Tags Data Science Business Development Data & Analytics

Similar Courses

Sorted by relevance

Like this course?

Here's what to do next:

  • Save this course for later
  • Get more details from the course provider
  • Enroll in this course
Enroll Now