Save for later

Spark and Python for Big Data with PySpark

Learn the latest Big Data Technology - Spark. And learn to use it with one of the most popular programming languages, Python.

One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark. The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems.

Spark can perform up to 100x faster than Hadoop MapReduce, which has caused an explosion in demand for this skill. Because the Spark 2.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market.

This course will teach the basics with a crash course in Python, continuing on to learning how to use Spark DataFrames with the latest Spark 2.0 syntax. Once we've done that we'll go through how to use the MLlib Machine Library with the DataFrame syntax and Spark. All along the way you'll have exercises and Mock Consulting Projects that put you right into a real world situation where you need to use your new skills to solve a real problem.

We also cover the latest Spark Technologies, like Spark SQL, Spark Streaming, and advanced models like Gradient Boosted Trees. After you complete this course you will feel comfortable putting Spark and PySpark on your resume. This course also has a full 30 day money back guarantee and comes with a LinkedIn Certificate of Completion.

If you're ready to jump into the world of Python, Spark, and Big Data, this is the course for you.

Get Details and Enroll Now

OpenCourser is an affiliate partner of Udemy and may earn a commission when you buy through our links.

Get a Reminder

Send to:
Rating 4.3 based on 965 ratings
Length 10.5 total hours
Starts On Demand (Start anytime)
Cost $19
From Udemy
Instructor Jose Portilla
Download Videos Only via the Udemy mobile app
Language English
Tags Development Software Engineering

Get a Reminder

Send to:

Similar Courses

What people are saying

machine learning

Its tought to follow.Its not for the bigdata.It is for machine learning.

I was hopping of something better from you like I would get a chance to deal with real big data but it was just in the name of course very disappointed from this course This is by far the best course I took in machine learning with pyspark.

I have already learned a lot half way into this machine learning course with pyspark.

Course helps to give an introduction to spark dataframes and machine learning techniques using MLLIB explanation is good, But was interested in few concepts are missing like: 1) RDD.

Clear Lectures , best suit for any one who would like to start learn machine learning and Spark Trying to learn how to use Spark I was going through PySpark on tutorials point and trying to understand from multiple sources.

Good intro to Spark DataFrames and some machine learning examples.

This is course is really awesome to get a good start with Pyspark and machine learning.

Provided a Complete overview of implementing Machine learning with Pyspark along with Practical Examples.I think its a good course for beginners to get to know Spark and its usage for Machine learning The title of the course is misleading.

It should be called something like "A short introduction to machine learning with a little bit of pyspark".

The course is good but not me, I feel to use Machine Learning effectively, one should strong knowledge of Advanced Statistics, else this course will not be that useful.

Quick recap and easy explainations Wonderful, Really explained all the concepts in simplest manner To be honest, this is one of the best courses I've ever watched on Data Science & Machine Learning.

Finally different Machine Learning algorithms are exposed so you will able to get a complete knowledge on ML&DS by boarding several projects to practice more.

Clear simple explanations so far great content I'm just diving in into all the topics covered by the course: machine learning algorithms, NLP, etc.

This course is WAY to highly overlapped with the "Python for Data Science and Machine Learning Bootcamp" course!!!

Read more

big data

I'm working on a big data project for my capstone project in school, and I had no idea how to use PySpark.

vraag me aan het eind nog maar een keer mijn mening The name of the course is big data but so far I haven't find any .Linear,logistic,trees I have already completed in your other courses with the exact same basic examples.

Fantastic course for beginners like me and got what i expected from this course Examples used are not big data, Very Good, step-by-step process to understand how to use PySpark Very good presentation.

But working on big data others things should also be taught like how it works in a cluster mode.

Buen curso Spark May be it is too basic for me and it is focused to a data architecht and non for data scientist like me, but I want to give it another opportunity, let's gonna see the next lesson The course is excellent and has provided very good foundation on ML using pyspark The reason we learn spark is to use for big data.

It doesn't say too much about Spark itself (especially what is going on under the hood) and nothing at all about big data.

pyspark ist gut, aber es fehlt der Big Data Bezug!

I just mention some even better ifs The course misses the 'Big Data' part of the title.

However, I am surprised that few important topics that were introduced by the instructor early in the course such as 'Master/Slave Architecture' were not dealt with with an example involving Big Data.

I feel capable of tackling big data projects after completing this course!

I was looking for courses to be able to implement machine learning in big data.

its is a very interesting course in Avery hot topic: cloud computing with big data.

I was expecting more learning on big data than just explanation from pyspark document library.

Buy the course :) Once again, Jose Portilla strikes a home run with his comprehensible instruction of Spark and Python for Big Data with PySpark.

Read more

easy to understand

Very intuitive and easy to understand.

Instructor is explaining in engaging,easy to understand way Very easy to follow.

Easy to understand if you already have some python and sql knowledge.

easy to understand.

Easy to understand material.

Overall, the course was explained well and was easy to understand and go through the projects with ease.

This class gave me much more in depth knowledge about pyspark with easy to understand lecture and examples.

Easy to understand and the following code.

Easy to understand, starts from the beginning, easy to focus on the pieces that interest you the most.

Easy to understand.

Thorough, easy to understand.

Always very organized and easy to understand.

Great instructor easy to understand.

Superb and clear explanations makes it very easy to understand.

Read more

very well explained

Very well explained every topic.

I love Portilla Very well explained introduction So good and brief!

Topics very well explained.

Nonetheless, the referred book is a really good one, the course content is clear, well organized, well documented, very well explained.

The course is structured, very well explained and documented, all the credits to the author.

very well explained and recommend it.

Was very nice course It is very well explained step-by-step so anybody can go throuth it Love it so far very clear explanations This course helped me build a basic knowledge about spark, by going through diverse use cases and algorithms.

Very well explained.

very well explained with a very sharp clarity.

Read more

real world

Maybe just some other lectures about spark streaming and integration with Kafka would complete the course and rate it to 5stars It lacks of a real world example working with database like mysql or mongodb.

it is out of date, i spent hours trying to set up my environment Jose has failed to explain this topic for real world scenarios such as Spark failure and recovery mechanism etc.

What other options could be there is the real world scenario.

The projects are very practical and there is a good amount of experience with various real world data sets.

For me personally I would have wished to see more of how to use this in real world with how to build clusters, use HDFS/S3, etc to fully get to the more production realistic use cases.

You haven't covered any topic regarding how to make spark cluster(distributed environment) and register worker etc and you haven't taught how to load data from Sql/NoSql databases because in real world in most of the cases we load data from some sort of Sql/NoSql db instead of just loading them from flat files like json/csv/textfiles.

With this class I was able to rapidly apply the knowledge to a real world scenario.

However the "real world" applications are quite oversimplified.

My favourite part was the Natural Language processing, really helped me overcome an obstacle I have in a real world consulting project and the example (spam detection) was very good real-world scenario.

The way that this course has made is really useful to learn and for applying all these methods in the real world.

If you want to learn and improve yourself in the real world, that course is awesome.

Read more

step by step

Awesome experience with this step by step course.

Go through detailed environment setup and syntax step by step But need real big data practice.

:) He explains step by step without missing on things which really makes me feel I am learning the subject It would be wonderful if the course can be updated with structured streaming topics as and when the stable release is out.

The videos are really good and I was able to follow along step by step.

explanation of RDDs Very good step by step explanation on how to setup Spark and required components.

Great course, good speed of teaching, step by step examples in jupyter notebook, good quality of course materials.

Read more

set up

The only issues I have are that it is a bit dated now and something about environment set up for a local VM was not quite as straightforward as the lecture on it.

Course set up is very okay good and clear quick overview, condensed, good one!

What I'm missing here is how to set up a cluster and take advantage of working in cluster using Spark (which I believe is actually true strength of Spark) I had some experience with Spark before.

Goes over the basics of pyspark and MLLIB enough to give you some familiarity with some different types of ML algorithms and how they set up in Spark with python.

I had enrolled for this course maybe a year back and hadn't taken it because I got scared with the complicated AWS set up that Jose covers towards the end of the Python bootcamp course.

The environment set up was certainly muddy.

Before this course I literally spent days trying to set up pyspark.

When I tried to set up on ec2.

The course is well structured and very informative from it's first lecture Eu estou aprendendo bastante I like the instructions given on AWS EC2 set up.

Read more

consulting projects

Awesome course with practical and motivated consulting projects.

* Data manipulation and feature engineering - I feel like more time could have been spent have to change / manipulate the raw data, and would have been useful to do in one of the consulting projects.

I liked the 'consulting projects' which provided real-world exercises.

I really found the consulting projects interesting.

I never felt bored and the consulting projects were definitely a lot of fun as they felt applicable to the real world.

-Need more depth on what is BinaryClassicifaction vs MultiClassClassification -Really need to cover Naive Bayes -Like the Consulting Projects, very well done !

* Great consulting projects.

Read more

jupyter notebook

The course was good untill you miss or didn't explain the part How to import files on AWS and work on them on jupyter notebook.

always great courses from this instructor He is using very old SW versions, so after you can not install basic tools (Jupyter Notebook), you also can not install other tools (PySpark) that work together.

For a beginner Python coder like myself, the instructor made Spark and Databricks easy to transition to from the world of Python 3 and Jupyter Notebook.

This is really great you can open the code in the jupyter notebook which was shown in a course and see all explanations.

Configuring pyspark with ipython and jupyter notebook is bt challenging.

Read more

till now

till now yes.

The teacher till now is quite well prepared and offer a wide spectrum of possibilities to deal with Spark and Python I didnt find this course useful.

Excellent delivery by the instructor content of the course is very good till now and trainer is very good to deliver the content properly to the audiences.

nice lecture till now I was not sure of how to start, where to start.

its very clear till now clear explanation, well articulated, no verbal clutches on part of the lecturer.

Right level and contents going great Video is blurry so far so good vague concepts -> clear concepts Till now am able to follow :) Muy bien explicado Very friendly introduction This is a very good course about Python / Big Data; Le contenu est clair, intéressant, précis et concis.

Read more

looking forward

I am liking this course and looking forward to get some good and hands on content Awesome material and better explanation!

Thanks Jose...I enjoyed learning Python crash course and looking forward towards learning Spark modules.

Really looking forward to the rest of the training.

I am new to Spark and have experience in python ml looking forward to play with Spark Quality and explanation is good to understand Yes good as of now.

Super interesting, clearly scoped and presented, looking forward to the nest lecture!!!

Read more

data scientist

data source from the lecture pedagogical Well presented The course contents is very rich in details and it is a good start for every data scientists or data engineers.

I highly suggest this course for anyone seeking to become a data scientist or data engineer!

Discuss how a data scientist can visualize Spark DataFrames inline within the Jupyter or Databricks Notebook.

How would a data scientist impute missing data based on data in other columns using Spark?

This course is terrific and exactly the right course for a data scientist with experience in ML on other tools and platforms who needs to shift to Spark.

Read more

Careers

An overview of related careers and their average salaries in the US. Bars indicate income percentile.

Volunteer Big Data Engineer $48k

Data Scientist - Big Data $68k

Big Data and AWS Data Lake $73k

Big Data Developer (Streaming Data) $77k

Big data developer with AWS $78k

Research Scientist Big Data $94k

Big Data Developer Consultant $98k

Big Data Engineer 6 $107k

Big data and ETL specialist $121k

Big Data Specialist $149k

Principal Big Data Architect $180k

Senior Big Data Sales $181k

Write a review

Your opinion matters. Tell us what you think.

Rating 4.3 based on 965 ratings
Length 10.5 total hours
Starts On Demand (Start anytime)
Cost $19
From Udemy
Instructor Jose Portilla
Download Videos Only via the Udemy mobile app
Language English
Tags Development Software Engineering

Similar Courses

Sorted by relevance

Like this course?

Here's what to do next:

  • Save this course for later
  • Get more details from the course provider
  • Enroll in this course
Enroll Now