We may earn an affiliate commission when you visit our partners.
Course image
Jose Portilla

Learn the latest Big Data Technology - Spark. And learn to use it with one of the most popular programming languages, Python.

Read more

Learn the latest Big Data Technology - Spark. And learn to use it with one of the most popular programming languages, Python.

One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark. The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems.

Spark can perform up to 100x faster than Hadoop MapReduce, which has caused an explosion in demand for this skill. Because the Spark 2.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market.

This course will teach the basics with a crash course in Python, continuing on to learning how to use Spark DataFrames with the latest Spark 2.0 syntax. Once we've done that we'll go through how to use the MLlib Machine Library with the DataFrame syntax and Spark. All along the way you'll have exercises and Mock Consulting Projects that put you right into a real world situation where you need to use your new skills to solve a real problem.

We also cover the latest Spark Technologies, like Spark SQL, Spark Streaming, and advanced models like Gradient Boosted Trees. After you complete this course you will feel comfortable putting Spark and PySpark on your resume. This course also has a full 30 day money back guarantee and comes with a LinkedIn Certificate of Completion.

If you're ready to jump into the world of Python, Spark, and Big Data, this is the course for you.

Enroll now

What's inside

Learning objectives

  • Use python and spark together to analyze big data
  • Learn how to use the new spark 2.0 dataframe syntax
  • Work on consulting projects that mimic real world situations!
  • Classify customer churn with logisitic regression
  • Use spark with random forests for classification
  • Learn how to use spark's gradient boosted trees
  • Use spark's mllib to create powerful machine learning models
  • Learn about the databricks platform!
  • Get set up on amazon web services ec2 for big data analysis
  • Learn how to use aws elastic mapreduce service!
  • Learn how to leverage the power of linux with a spark environment!
  • Create a spam filter using spark and natural language processing!
  • Use spark streaming to analyze tweets in real time!
  • Show more
  • Show less

Syllabus

Welcome to the course!
Introduction
Course Overview
Frequently Asked Questions
Read more
What is Spark? Why Python?
Learn how to set-up Python and Spark on your system!

Let's explain the set-up for the course!

Note on Installation Sections
Installation Option 3: Databricks Platform
Recommended Setup
Databricks Setup
Installation Option 1: VirtualBox Setup with Ubuntu

Let's walk through the local installation of Ubuntu

Local Installation VirtualBox Part 2
Setting up PySpark
Installation Option 2: AWS EC2

Let's show you how to use Amazon Web Services' EC2 Instances for Spark!

Creating the EC2 Instance
SSH with Mac or Linux
Installations on EC2
Installation Option 4: AWS EMR
AWS EMR Setup
Quickly get up to speed with Python
Introduction to Python Crash Course
Jupyter Notebook Overview
Python Crash Course Part One
Python Crash Course Part Two
Python Crash Course Part Three
Python Crash Course Exercises
Python Crash Course Exercise Solutions
Learn how to work with Spark DataFrames in Python!
Introduction to Spark DataFrames

Learn the basics of Spark DataFrames!

Spark DataFrame Basics Part Two

Learn some basic operations with Spark 2.0

Groupby and Aggregate Operations
Missing Data
Dates and Timestamps
Get some practice with Spark DataFrames!
DataFrame Project Exercise
DataFrame Project Exercise Solutions
Learn about Machine Learning and MLlib
Introduction to Machine Learning and ISLR
Machine Learning with Spark and Python with MLlib
Learn the basics of Linear Regression with Python and Spark!
Linear Regression Theory and Reading
Linear Regression Documentation Example
Regression Evaluation
Linear Regression Example Code Along
Linear Regression Consulting Project
Linear Regression Consulting Project Solutions
Learn how to use Logistic Regression for Classification!
Logistic Regression Theory and Reading
Logistic Regression Example Code Along
Logistic Regression Code Along
Logistic Regression Consulting Project
Logistic Regression Consulting Project Solutions
Learn how to utilize Decision Trees and Random Forests in Spark with Python!
Tree Methods Theory and Reading
Tree Methods Documentation Examples
Decision Tress and Random Forest Code Along Examples
Random Forest - Classification Consulting Project
Random Forest Classification Consulting Project Solutions
Learn how to use K-means to cluster unlabeled data!
K-means Clustering Theory and Reading
KMeans Clustering Documentation Example
Clustering Example Code Along
Clustering Consulting Project
Clustering Consulting Project Solutions
Learn how to use Spark's built-in collaborative filtering!
Introduction to Recommender Systems
Recommender System - Code Along Project
Learn how to use Python and Spark for Natural Language Processing!
Introduction to Natural Language Processing
NLP Tools Part One
NLP Tools Part Two
Natural Language Processing Code Along Project
Learn how to use Spark to work with streaming data.
Introduction to Streaming with Spark!
Spark Streaming Documentation Example
Spark Streaming Twitter Project - Part
Spark Streaming Twitter Project - Part Two
Spark Streaming Twitter Project - Part Three
Get special offers on other courses!
Bonus Lecture:

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Suits professionals who want to combine Python with Big Data analysis for career advancement
May be less suited for beginners who lack prior programming or data analysis experience
Provides hands-on experience through exercises and Mock Consulting Projects
Covers advanced Spark technologies like Spark SQL, Spark Streaming, and Gradient Boosted Trees
Teaches the latest Spark 2.0 DataFrame syntax, keeping learners up-to-date with current industry practices
Leverages real-life scenarios through Consulting Projects, simulating professional problem-solving

Save this course

Save Spark and Python for Big Data with PySpark to your list so you can find it easily later:
Save

Reviews summary

Decent pyspark introduction

According to students, Spark and Python for Big Data with PySpark is a decent way to get started with features of PySpark. While this course may be outdated, learners say it still offers a solid foundation and can be especially helpful for those new to PySpark.
Students who are new to PySpark have found this course to be especially helpful.
"This was a really great way to get an introduction to some interesting features of pyspark"
Some students feel the course content is outdated.
"it is a bit dated now"
"something about environment set up for a local VM was not quite as straightforward as the lecture on it"
"I believe this is also due to some things being outdated"

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Spark and Python for Big Data with PySpark with these activities:
Apache Spark and Python Tutorial by Databricks
Follow this tutorial to reinforce your understanding of Apache Spark and Python, the main technologies used in this course.
Browse courses on Apache Spark
Show steps
  • Follow the tutorial's instructions and complete all the exercises.
Join a Spark and Python Study Group
Find a study group or start one yourself to discuss course material, work on projects together, and quiz each other.
Browse courses on Apache Spark
Show steps
  • Find or start a study group with fellow students.
  • Meet regularly to discuss course material.
  • Work on projects together.
  • Quiz each other on key concepts.
Spark and Python Coding Exercises
Solve coding exercises to improve your proficiency in using Spark and Python for data analysis.
Browse courses on Apache Spark
Show steps
  • Find online coding exercises or create your own.
  • Solve the exercises using Spark and Python.
  • Review your solutions and identify areas for improvement.
Three other activities
Expand to see all activities and additional details
Show all six activities
Learning Spark: Lightning-Fast Data Analytics
Read this book to gain a comprehensive understanding of Apache Spark and its applications in data analytics.
Show steps
  • Read the book thoroughly.
  • Take notes and highlight important concepts.
  • Complete the exercises and projects in the book.
Spark and Python Data Analysis Project
Develop a data analysis project using Spark and Python to solidify your understanding of the technologies and their applications.
Browse courses on Apache Spark
Show steps
  • Define the scope and objectives of your project.
  • Gather and prepare your data.
  • Develop your Spark and Python code.
  • Analyze your results and draw conclusions.
  • Write a report or presentation on your project.
Build a Spark and Python Application
Take your learning to the next level by building a real-world application using Spark and Python.
Browse courses on Apache Spark
Show steps
  • Identify a problem or opportunity that can be addressed with Spark and Python.
  • Design and develop your application.
  • Test and deploy your application.
  • Monitor and maintain your application.

Career center

Learners who complete Spark and Python for Big Data with PySpark will develop knowledge and skills that may be useful to these careers:
Data Scientist
Data Scientists are responsible for collecting, cleaning, and analyzing data. They work with large datasets to identify trends, patterns, and insights. Data Scientists may use Spark and Python to analyze big data and build machine learning models. This course can help you develop the skills and knowledge you need to become a successful Data Scientist. You will learn how to use Spark DataFrames, MLlib, and other tools to analyze big data. You will also learn how to use Python to automate and streamline your work.
Machine Learning Engineer
Machine Learning Engineers are responsible for developing and deploying machine learning models. They work with large datasets to train and test machine learning models. Machine Learning Engineers may use Spark and Python to analyze big data and build machine learning models. This course can help you develop the skills and knowledge you need to become a successful Machine Learning Engineer. You will learn how to use Spark DataFrames, MLlib, and other tools to build machine learning models. You will also learn how to use Python to automate and streamline your work.
Data Management Analyst
Data Management Analysts are responsible for managing data assets. They work with large datasets to ensure that data is clean, accurate, and accessible. Data Management Analysts may use Spark and Python to analyze big data and build data management systems. This course can help you develop the skills and knowledge you need to become a successful Data Management Analyst. You will learn how to use Spark DataFrames, MLlib, and other tools to analyze big data. You will also learn how to use Python to automate and streamline your work.
Data Analyst
Data Analysts play a key role in many industries, including finance, healthcare, technology, and retail. They work with large datasets to identify trends, patterns, and insights. Data Analysts may use Spark and Python to analyze big data, a skill that is in high demand in the tech sector. This course can help you develop the skills and knowledge you need to become a successful Data Analyst. You will learn how to use Spark DataFrames, MLlib, and other tools to analyze big data. You will also learn how to use Python to automate and streamline your work.
Database Administrator
Database Administrators are responsible for managing databases. They work with large datasets to ensure that databases are running smoothly and efficiently. Database Administrators may use Spark and Python to analyze big data and build database systems. This course can help you develop the skills and knowledge you need to become a successful Database Administrator. You will learn how to use Spark DataFrames, MLlib, and other tools to analyze big data. You will also learn how to use Python to automate and streamline your work.
Data Engineer
Data Engineers are responsible for building and maintaining data pipelines. They work with large datasets to ensure that data is clean, accurate, and accessible. Data Engineers may use Spark and Python to analyze big data and build machine learning models. This course can help you develop the skills and knowledge you need to become a successful Data Engineer. You will learn how to use Spark DataFrames, MLlib, and other tools to build data pipelines. You will also learn how to use Python to automate and streamline your work.
Quantitative Analyst
Quantitative Analysts use mathematical and statistical methods to analyze financial data. They work with large datasets to identify trends, patterns, and insights. Quantitative Analysts may use Spark and Python to analyze big data and build financial models. This course can help you develop the skills and knowledge you need to become a successful Quantitative Analyst. You will learn how to use Spark DataFrames, MLlib, and other tools to analyze financial data. You will also learn how to use Python to automate and streamline your work.
Operations Research Analyst
Operations Research Analysts use mathematical and statistical methods to solve business problems. They work with large datasets to identify trends, patterns, and insights. Operations Research Analysts may use Spark and Python to analyze big data and build optimization models. This course can help you develop the skills and knowledge you need to become a successful Operations Research Analyst. You will learn how to use Spark DataFrames, MLlib, and other tools to analyze big data. You will also learn how to use Python to automate and streamline your work.
Statistician
Statisticians use mathematical and statistical methods to analyze data. They work with large datasets to identify trends, patterns, and insights. Statisticians may use Spark and Python to analyze big data and build statistical models. This course can help you develop the skills and knowledge you need to become a successful Statistician. You will learn how to use Spark DataFrames, MLlib, and other tools to analyze big data. You will also learn how to use Python to automate and streamline your work.
Software Engineer
Software Engineers design, develop, and maintain software systems. They work with a variety of programming languages and technologies, including Python and Spark. Software Engineers may use Spark and Python to analyze big data and build machine learning models. This course can help you develop the skills and knowledge you need to become a successful Software Engineer. You will learn how to use Spark DataFrames, MLlib, and other tools to build software systems. You will also learn how to use Python to automate and streamline your work.
Financial Analyst
Financial Analysts use financial data to make investment decisions. They work with large datasets to identify trends, patterns, and insights. Financial Analysts may use Spark and Python to analyze big data and build financial models. This course can help you develop the skills and knowledge you need to become a successful Financial Analyst. You will learn how to use Spark DataFrames, MLlib, and other tools to analyze financial data. You will also learn how to use Python to automate and streamline your work.
Business Analyst
Business Analysts work with businesses to identify and solve problems. They use data and analysis to help businesses make better decisions. Business Analysts may use Spark and Python to analyze big data and identify business opportunities. This course can help you develop the skills and knowledge you need to become a successful Business Analyst. You will learn how to use Spark DataFrames, MLlib, and other tools to analyze big data. You will also learn how to use Python to automate and streamline your work.
Market Researcher
Market Researchers collect and analyze data about consumer behavior. They use data to identify trends, patterns, and insights. Market Researchers may use Spark and Python to analyze big data and identify marketing opportunities. This course can help you develop the skills and knowledge you need to become a successful Market Researcher. You will learn how to use Spark DataFrames, MLlib, and other tools to analyze big data. You will also learn how to use Python to automate and streamline your work.
Information Security Analyst
Information Security Analysts are responsible for protecting data from unauthorized access. They work with large datasets to identify security threats and vulnerabilities. Information Security Analysts may use Spark and Python to analyze big data and build security systems. This course can help you develop the skills and knowledge you need to become a successful Information Security Analyst. You will learn how to use Spark DataFrames, MLlib, and other tools to analyze big data. You will also learn how to use Python to automate and streamline your work.
Actuary
Actuaries use mathematical and statistical methods to assess risk. They work with large datasets to identify trends, patterns, and insights. Actuaries may use Spark and Python to analyze big data and build risk models. This course can help you develop the skills and knowledge you need to become a successful Actuary. You will learn how to use Spark DataFrames, MLlib, and other tools to analyze big data. You will also learn how to use Python to automate and streamline your work.

Reading list

We've selected 11 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Spark and Python for Big Data with PySpark.
Provides a comprehensive overview of Spark, covering its core concepts, APIs, and use cases. It valuable resource for anyone looking to learn more about Spark and its applications in big data analytics.
Provides a comprehensive overview of statistical learning. It covers a wide range of topics, including supervised learning, unsupervised learning, and deep learning. It valuable resource for anyone looking to learn more about statistical learning.
Provides a comprehensive overview of statistical learning. It covers a wide range of topics, including supervised learning, unsupervised learning, and deep learning. It valuable resource for anyone looking to learn more about statistical learning.
Provides a comprehensive overview of deep learning. It covers a wide range of topics, including neural networks, convolutional neural networks, and recurrent neural networks. It valuable resource for anyone looking to learn more about deep learning.
Provides a comprehensive overview of data science from scratch. It covers a wide range of topics, including data loading, transformation, and analysis. It valuable resource for anyone looking to learn more about data science.
Provides a comprehensive overview of natural language processing with deep learning. It covers a wide range of topics, including text processing, machine learning, and deep learning. It valuable resource for anyone looking to learn more about natural language processing with deep learning.
Provides a comprehensive overview of speech and language processing. It covers a wide range of topics, including speech recognition, natural language processing, and machine learning. It valuable resource for anyone looking to learn more about speech and language processing.
Provides a comprehensive overview of computer vision. It covers a wide range of topics, including image processing, object recognition, and scene understanding. It valuable resource for anyone looking to learn more about computer vision.
Provides a comprehensive overview of deep learning with Python. It covers a wide range of topics, including neural networks, convolutional neural networks, and recurrent neural networks. It valuable resource for anyone looking to learn more about deep learning with Python.
Provides a comprehensive overview of natural language processing with Python. It covers a wide range of topics, including text processing, machine learning, and deep learning. It valuable resource for anyone looking to learn more about natural language processing with Python.
Provides a comprehensive overview of Python for data analysis. It covers a wide range of topics, including data loading, transformation, and analysis. It valuable resource for anyone looking to learn more about Python for data analysis.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Spark and Python for Big Data with PySpark.
Scala and Spark for Big Data and Machine Learning
Most relevant
Use the Apache Spark Structured Streaming API with MongoDB
Scalable Machine Learning on Big Data using Apache Spark
Apache Spark 2.0 with Java -Learn Spark from a Big Data...
Complete Tensorflow 2 and Keras Deep Learning Bootcamp
Data Engineering Essentials using SQL, Python, and PySpark
Big Data Analysis with Scala and Spark (Scala 2 version)
Big Data Analysis with Scala and Spark
Apache Spark with Scala - Hands On with Big Data!
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser