We may earn an affiliate commission when you visit our partners.
Course image
Tao W., James Lee, Level Up, and Jiarui Zhou

What is this course about:

Read more

What is this course about:

This course covers all the fundamentals about Apache Spark with Java and teaches you everything you need to know about developing Spark applications with Java. At the end of this course, you will gain in-depth knowledge about Apache Spark and general big data analysis and manipulations skills to help your company to adapt Apache Spark for building big data processing pipeline and data analytics applications.

This course covers 10+ hands-on big data examples. You will learn valuable knowledge about how to frame data analysis problems as Spark problems. Together we will learn examples such as aggregating NASA Apache web logs from different sources; we will explore the price trend by looking at the real estate data in California; we will write Spark applications to find out the median salary of developers in different countries through the Stack Overflow survey data; we will develop a system to analyze how maker spaces are distributed across different regions in the United Kingdom. And much much more.

What will you learn from this lecture:

In particularly, you will learn:

  • An overview of the architecture of Apache Spark.

  • Develop Apache Spark 2.0 applications with Java using RDD transformations and actions and Spark SQL.

  • Work with Apache Spark's primary abstraction, resilient distributed datasets(RDDs) to process and analyze large data sets.

  • Deep dive into advanced techniques to optimize and tune Apache Spark jobs by partitioning, caching and persisting RDDs.

  • Scale up Spark applications on a Hadoop YARN cluster through Amazon's Elastic MapReduce service.

  • Analyze structured and semi-structured data using Datasets and DataFrames, and develop a thorough understanding of Spark SQL.

  • Share information across different nodes on an Apache Spark cluster by broadcast variables and accumulators.
  • Best practices of working with Apache Spark in the field.

  • Big data ecosystem overview.

Why shall we learn Apache Spark:

Apache Spark gives us unlimited ability to build cutting-edge applications. It is also one of the most compelling technologies of the last decade in terms of its disruption to the big data world.

Spark provides in-memory cluster computing which greatly boosts the speed of iterative algorithms and interactive data mining tasks.

Apache Spark is the next-generation processing engine for big data.

Tons of companies are adapting Apache Spark to extract meaning from massive data sets, today you have access to that same big data technology right on your desktop.

Apache Spark is becoming a must tool for big data engineers and data scientists.

About the author:

Since 2015, James has been helping his company to adapt Apache Spark for building their big data processing pipeline and data analytics applications.

James' company has gained massive benefits by adapting Apache Spark in production. In this course, he is going to share with you his years of knowledge and best practices of working with Spark in the real field.

Why choosing this course?

This course is very hands-on, James has put lots effort to provide you with not only the theory but also real-life examples of developing Spark applications that you can try out on your own laptop.

James has uploaded all the source code to Github and you will be able to follow along with either Windows, MAC OS or Linux.

In the end of this course, James is confident that you will gain in-depth knowledge about Spark and general big data analysis and data manipulation skills. You'll be able to develop Spark application that analyzes Gigabytes scale of data both on your laptop, and in the cloud using Amazon's Elastic MapReduce service.

30-day Money-back Guarantee.

You will get 30-day money-back guarantee from Udemy for this course.

If not satisfied simply ask for a refund within 30 days. You will get a full refund. No questions whatsoever asked.

Are you ready to take your big data analysis skills and career to the next level, take this course now.

You will go from zero to Spark hero in 4 hours.

Enroll now

What's inside

Learning objectives

  • An overview of the architecture of apache spark.
  • Work with apache spark's primary abstraction, resilient distributed datasets(rdds) to process and analyze large data sets.
  • Develop apache spark 2.0 applications using rdd transformations and actions and spark sql.
  • Scale up spark applications on a hadoop yarn cluster through amazon's elastic mapreduce service.
  • Analyze structured and semi-structured data using datasets and dataframes, and develop a thorough understanding about spark sql.
  • Share information across different nodes on a apache spark cluster by broadcast variables and accumulators.
  • Advanced techniques to optimize and tune apache spark jobs by partitioning, caching and persisting rdds.
  • Best practices of working with apache spark in the field.

Syllabus

Get Started with Apache Spark
Course Overview
How to Take this Course and How to Get Support
Text Lecture: How to Take this Course and How to Get Support
Read more
Introduction to Spark
Sides
Java 9 Warning
Install Java and Git
Source Code
Set up Spark project with IntelliJ IDEA
Set up Spark project with Eclipse
Text lecture: Set up Spark project with Eclipse
Run our first Spark job
Trouble shooting: running Hadoop on Windows
RDD
RDD Basics
Create RDDs
Text Lecture: Create RDDs
Map and Filter Transformation
Solution to Airports by Latitude Problem
FlatMap Transformation
Text Lectures: flatMap Transformation
Set Operation
Sampling With Replacement and Sampling Without Replacement
Solution for the Same Hosts Problem
Actions
Solution to Sum of Numbers Problem
Important Aspects about RDD
Summary of RDD Operations
Caching and Persistence
Spark Architecture and Components
Spark Architecture
Spark Components
Pair RDD
Introduction to Pair RDD
Create Pair RDDs
Filter and MapValue Transformations on Pair RDD
Reduce By Key Aggregation
Sample solution for the Average House problem
Group By Key Transformation
Sort By Key Transformation
Sample Solution for the Sorted Word Count Problem
Data Partitioning
Join Operations
Extra Learning Material: How are Big Companies using Apache Spark
Advanced Spark Topic
Accumulators
Text Lecture: Accumulators
Solution to StackOverflow Survey Follow-up Problem
Broadcast Variables
Spark SQL
Introduction to Spark SQL
Spark SQL in Action
Spark SQL practice: House Price Problem
Spark SQL Joins
Strongly Typed Dataset
Use Dataset or RDD
Dataset and RDD Conversion
Performance Tuning of Spark SQL
Extra Learning Material: Avoid These Mistakes While Writing Apache Spark Program
Running Spark in a Cluster
Introduction to Running Spark in a Cluster
Package Spark Application and Use spark-submit
Run Spark Application on Amazon EMR (Elastic MapReduce) cluster
Additional Learning Materials
Future Learning
Text Lecture: Future Learning
Coupons to Our Other Courses

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Provides thorough foundation in Apache Spark's RDDs, which are the core data structure for processing and analyzing large data sets in Spark applications
Covers use cases such as aggregating NASA Apache web logs, exploring price trends in California real estate data, finding median salaries of developers across countries, and analyzing distribution of maker spaces in the UK
Instructed by James Lee, who has years of experience in adapting Apache Spark for building big data processing pipelines and data analytics applications, and claims that students will gain in-depth knowledge about Spark and general big data analysis skills
Provides hands-on labs and interactive materials, and source code for all examples is available on Github
Develops foundational programming skills in Java for building Apache Spark applications
Includes modules on Spark SQL, Datasets, and DataFrames, which are essential for analyzing structured and semi-structured big data

Save this course

Save Apache Spark 2.0 with Java -Learn Spark from a Big Data Guru to your list so you can find it easily later:
Save

Reviews summary

Good for beginners with spark

learners say this course offers engaging assignments well-suited for beginners looking to learn the basics of Apache Spark with Java
Helpful demonstrations and examples.
"The video content overall was very good with lot of practical demonstrations."
Suitable for those new to Apache Spark.
"I am a beginner to Apache Spark and this was my first online tutorial video."
"I am able to grasp the starting bit of skills required for getting started with Apache Spark."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Apache Spark 2.0 with Java -Learn Spark from a Big Data Guru with these activities:
Review your notes from previous courses on related topics
Reviewing your notes will help you to refresh your knowledge of related topics and to better prepare for this course.
Show steps
  • Identify the previous courses you have taken that are related to this topic.
  • Gather your notes from those courses.
  • Review your notes to refresh your knowledge of the topic.
Join a study group or online forum for Apache Spark
Joining a study group or online forum will allow you to connect with other learners and get help with your studies.
Browse courses on Apache Spark
Show steps
  • Find a study group or online forum for Apache Spark.
  • Introduce yourself and ask questions.
  • Participate in discussions and help other learners.
Review Introduction to Apache Spark by Holden Karau and Andy Konwinski
This book will provide you with a solid foundation in Apache Spark and help you apply analytics on big data in a variety of languages including Java.
Show steps
  • Read the first five chapters of the book.
  • Work through the examples in the book.
  • Complete the practice exercises at the end of each chapter.
Five other activities
Expand to see all activities and additional details
Show all eight activities
Follow the Apache Spark Tutorial on the Databricks website
This tutorial will guide you through the basics of Apache Spark and help you get started with developing Spark applications.
Browse courses on Apache Spark
Show steps
  • Create a Databricks account.
  • Follow the steps in the tutorial to create your first Spark application.
  • Run the application and explore the results.
Complete the Apache Spark Hands-on Lab on Coursera
This lab will provide you with hands-on experience with Apache Spark and help you develop your skills in data analysis and processing.
Browse courses on Apache Spark
Show steps
  • Create a Coursera account.
  • Enroll in the Apache Spark Hands-on Lab.
  • Complete the lab exercises.
Mentor other students who are learning Apache Spark
Mentoring other students will help you to reinforce your own understanding of Apache Spark and to develop your leadership skills.
Browse courses on Apache Spark
Show steps
  • Identify a student who is learning Apache Spark.
  • Offer to mentor the student.
  • Meet with the student regularly to provide guidance and support.
Build a Spark application to analyze a real-world dataset
This project will allow you to apply your skills in Apache Spark to solve a real-world problem.
Browse courses on Apache Spark
Show steps
  • Identify a real-world dataset that you would like to analyze.
  • Design a Spark application to analyze the dataset.
  • Implement the Spark application.
  • Analyze the results of the application.
Contribute to the Apache Spark open-source project
Contributing to the Apache Spark open-source project will give you the opportunity to learn from and collaborate with other developers.
Browse courses on Apache Spark
Show steps
  • Find a bug or feature request in the Apache Spark issue tracker.
  • Fork the Apache Spark repository.
  • Make changes to the codebase.
  • Submit a pull request.

Career center

Learners who complete Apache Spark 2.0 with Java -Learn Spark from a Big Data Guru will develop knowledge and skills that may be useful to these careers:
Big Data Engineer
As a Big Data Engineer, you will design, build, and maintain big data systems. This course will help you develop the skills you need to be successful in this role, including data architecture, data engineering, and cloud computing.
Data Scientist
As a Data Scientist, you will use your knowledge of Spark and other big data tools to build predictive models and solve complex business problems. This course will help you develop the skills you need to be successful in this role, including data mining, machine learning, and statistical modeling.
Business Intelligence Analyst
As a Business Intelligence Analyst, you will use your knowledge of Spark and other big data tools to analyze data and provide insights to businesses. This course will help you develop the skills you need to be successful in this role, including data analysis, data visualization, and business intelligence.
Data Analyst
As a Data Analyst, you will use your knowledge of big data frameworks like Spark to extract insights from large datasets. You will then use those insights to improve decision-making and drive business outcomes. This course will help you develop the skills you need to be successful in this role, including data wrangling, data analysis, and data visualization.
Data Warehouse Engineer
As a Data Warehouse Engineer, you will design, build, and maintain data warehouses. This course will help you develop the skills you need to be successful in this role, including data modeling, data integration, and data warehousing.
Data Architect
As a Data Architect, you will design and build data architectures. This course will help you develop the skills you need to be successful in this role, including data modeling, data warehousing, and big data.
Software Engineer
As a Software Engineer, you may work on developing big data applications and systems. This course will help you develop the skills you need to be successful in this role, including Java programming, software design, and data structures.
Machine Learning Engineer
As a Machine Learning Engineer, you will develop and deploy machine learning models. This course may be useful for developing the skills you need to be successful in this role, including machine learning, data analysis, and software engineering.
Statistician
As a Statistician, you will use your knowledge of statistics and data analysis to solve problems. This course may be useful for developing the skills you need to be successful in this role, including data analysis, statistical modeling, and probability.
Operations Research Analyst
As an Operations Research Analyst, you will use your knowledge of mathematics and data analysis to solve problems in a variety of industries. This course may be useful for developing the skills you need to be successful in this role, including data analysis, optimization, and modeling.
Cloud Architect
As a Cloud Architect, you will design and build cloud-based solutions. This course may be useful for developing the skills you need to be successful in this role, including cloud computing, data architecture, and software engineering.
Actuary
As an Actuary, you will use your knowledge of mathematics, statistics, and finance to assess risk and make financial decisions. This course may be useful for developing the skills you need to be successful in this role, including data analysis, financial modeling, and probability.
Financial Analyst
As a Financial Analyst, you will use your knowledge of finance and data analysis to make investment decisions. This course may be useful for developing the skills you need to be successful in this role, including data analysis, financial modeling, and valuation.
Market Researcher
As a Market Researcher, you will use your knowledge of marketing and data analysis to conduct market research studies. This course may be useful for developing the skills you need to be successful in this role, including data analysis, survey design, and focus groups.
Database Administrator
As a Database Administrator, you will manage and maintain databases. This course may be useful for developing the skills you need to be successful in this role, including data management, database design, and database optimization.

Reading list

We've selected six books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Apache Spark 2.0 with Java -Learn Spark from a Big Data Guru.
Fits well with this course as it introduces learners to fundamental Spark concepts and covers key aspects such as RDDs, SQL, streaming and machine learning. Suitable for beginners and seasoned Spark users looking to expand their knowledge.
Apache Spark influences this book, focused on describing the Spark architecture, RDDs, Spark SQL, and more. Like the course, this book focuses on data analysis from a Spark perspective.
Covers more advanced topics than the course and would be most beneficial for those who are looking to specialize in big data analytics or data science.
Suitable for those with prior Spark knowledge, it delves into advanced topics like stream processing, graph analytics, and machine learning, providing insights and patterns for complex data analysis tasks.
While the course focuses on Spark's general capabilities, this book explores Spark's machine learning capabilities, providing hands-on examples and techniques for building and deploying machine learning models using Spark.
Comprehensive guide to the Apache Hadoop framework, which prerequisite for Apache Spark. It will be an excellent foundation for those who are new to big data technologies.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Apache Spark 2.0 with Java -Learn Spark from a Big Data Guru.
Developing Spark Applications Using Scala & Cloudera
Most relevant
Introduction to Big Data with Spark and Hadoop
Most relevant
Getting Started with Apache Spark on Databricks
Most relevant
Data Engineering and Machine Learning using Spark
Most relevant
Apache Spark Fundamentals
Most relevant
Data Engineering with MS Azure Synapse Apache Spark Pools
Most relevant
Big Data, Hadoop, and Spark Basics
Most relevant
Spark and Data Lakes
Most relevant
Apache Spark for Data Engineering and Machine Learning
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser