We may earn an affiliate commission when you visit our partners.
Janani Ravi

The 2.x releases of Spark represent significantly different and upgraded features. This course will focus on all of these changes, in both theory and practice.

Read more

The 2.x releases of Spark represent significantly different and upgraded features. This course will focus on all of these changes, in both theory and practice.

Spark is possibly the most popular engine for big data processing these days and the 2.x release has several new features which make Spark more powerful and easy to work with. In this course, Getting Started with Spark 2, you will get up and running with Spark 2 and understand the similarities and differences between version 2.x and older versions. First, you will get to see the basic Spark architecture and the details of Project Tungsten which brought great performance improvements to Spark 2. You will go over the new developer APIs using DataFrames and see how they inter-operate with RDDs from Spark 1.x. Next, you will move on to big data processing where you will load and clean datasets, remove invalid rows, execute transformations to extract insights and perform grouping, sorting, and aggregations using the new DataFrame APIs. You will also study how and where to use broadcast variables and accummulators. Finally, you will work with Spark SQL which allows you to use SQL commands for big data processing. The course also covers advanced SQL support in the form of windowing operations. At the end of this course, you should be very comfortable working with Spark DataFrames and Spark SQL. You will be better equipped to make technical choices based on the performance trade-offs of older versions of Spark vs. Spark 2. Software required: Apache Spark 2.2, Python 2.7.

Enroll now

What's inside

Syllabus

Course Overview
Understanding Differences Between Spark 2.x and Spark 1.x
Exploring and Analyzing Data with DataFrames
Querying Data Using Spark SQL
Read more

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Develops a solid foundation for beginners with data processing using Spark 2.x
Teaches Spark 2.x, which is highly relevant to industry
Emphasizes both theoretical and practical aspects of Spark 2.x
Uses Spark DataFrames and Spark SQL, which are essential for data processing
Incorporates interactive materials and hands-on labs for practical skills development

Save this course

Save Getting Started with Spark 2 to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Getting Started with Spark 2 with these activities:
Review 'Learning Spark: Lightning-Fast Data Analytics'
Reinforce your understanding of Spark concepts and best practices by reviewing this comprehensive guide.
Show steps
  • Read Chapter 2: Spark Architecture and Internals
  • Review Chapter 4: DataFrames and Datasets
  • Skim Chapter 6: SQL and DataFrames
Connect with Spark professionals at industry meetups
Expand your professional network and gain insights from experienced Spark practitioners.
Browse courses on Networking
Show steps
  • Identify and attend relevant meetups or conferences
  • Introduce yourself and engage in conversations
  • Share your knowledge and ask questions
  • Follow up with connections and explore potential collaborations
Attend a workshop on Spark 2 optimization techniques
Gain insights and hands-on experience in optimizing Spark applications from industry experts.
Show steps
  • Research and identify a relevant workshop
  • Register and attend the workshop
  • Actively participate and ask questions
  • Apply the learned techniques to your own Spark projects
Two other activities
Expand to see all activities and additional details
Show all five activities
Follow tutorials on advanced Spark SQL features
Explore advanced SQL features such as windowing operations to enhance data analysis capabilities.
Browse courses on Spark SQL
Show steps
  • Identify scenarios where windowing functions can enhance data analysis
  • Learn different windowing functions and their syntax
  • Practice using windowing functions in Spark SQL queries
Optimize Spark applications
Gain practical experience in optimizing Spark applications for better performance.
Show steps
  • Identify performance bottlenecks in Spark applications
  • Apply techniques to improve performance such as caching, partitioning, and using optimized data structures
  • Measure and track performance improvements

Career center

Learners who complete Getting Started with Spark 2 will develop knowledge and skills that may be useful to these careers:
Big Data Architect
Big Data Architects design and build data architectures for businesses. As a Big Data Architect, you will use your knowledge of Spark to design and implement big data solutions for businesses. This course can help you become a Big Data Architect because it provides hands-on experience with the latest version of Spark, a popular big data processing engine.
Software Engineer - Big Data
Software Engineers specializing in big data are responsible for designing, implementing, and maintaining software that processes large datasets. As a Software Engineer in Big Data, you will use your knowledge of Apache Spark to handle the technical aspects of big data management, including performance optimization and data analytics. This course will help you become effective in writing code for Spark, as it provides hands-on practice in how to use the latest version of Spark.
Data Engineer
Data Engineers design and build data pipelines that move data from source systems to data warehouses and other destinations. As a Data Engineer, you will use your knowledge of Spark to develop and manage data pipelines for businesses. This course can help you become a Data Engineer because it provides hands-on experience with the latest version of Spark, which is a leading big data processing engine.
Data Analyst
Data Analysts manipulate data to discover trends and patterns and help businesses make informed decisions. As a Data Analyst, you can use Spark to perform complex data analysis on large datasets, and you will be able to stay on top of the industry by being familiar with the latest Spark developments, which this course will help you do.
Database Administrator
Database Administrators manage and maintain databases, including the hardware and software used to store and process data. As a Database Administrator, you will use your knowledge of Spark to manage and maintain big data databases. This course can help you become a Database Administrator because it provides hands-on experience with the latest version of Spark, a popular big data processing engine.
Data Scientist
Data Scientists analyze data to help companies make better decisions and are essential for large corporations seeking insights into marketing campaigns, sales targets, and financial planning. Gaining experience with Apache Spark, one of the industry standard data processing engines for big data, will give you a huge advantage in your job search. This course can help you become a Data Scientist because it focuses on the latest version of Spark, which represents significant upgrades to Spark's features and functions.
Machine Learning Engineer
Machine Learning Engineers design and build machine learning models that can learn from data and make predictions. As a Machine Learning Engineer, you will use Spark to develop and train machine learning models on large datasets. This course can help you become a Machine Learning Engineer because it provides hands-on experience with the latest version of Spark, which is a popular big data processing engine used in machine learning.
Data Architect
Data Architects design and manage data systems, including the hardware and software used to store and process data. As a Data Architect, you will use your knowledge of Spark to design and implement big data solutions for businesses. This course can help you become a Data Architect because it provides hands-on experience with the latest version of Spark, a big data processing engine that is used in many industries.
Software Developer
Software Developers design, build, and maintain software applications. As a Software Developer, you will use your knowledge of Spark to develop and maintain big data applications. This course can help you become a Software Developer because it provides hands-on experience with the latest version of Spark, a popular big data processing engine.
Data Science Manager
Data Science Managers lead and manage teams of data scientists and other professionals who work with big data. As a Data Science Manager, you will use your knowledge of Spark to manage and coordinate big data projects. This course can help you become a Data Science Manager because it provides hands-on experience with the latest version of Spark, a popular big data processing engine.
Software Engineer (Data Science)
Software Engineers specializing in Data Science develop and maintain software applications that use data science techniques. As a Software Engineer in Data Science, you will use your knowledge of Spark to develop and maintain big data applications. This course can help you become a Software Engineer in Data Science because it provides hands-on experience with the latest version of Spark, a popular big data processing engine.
Statistician
Statisticians collect, analyze, and interpret data to provide insights into a variety of topics. As a Statistician, you will use Spark to analyze large datasets and identify trends and patterns. This course can help you become a Statistician because it provides hands-on experience with the latest version of Spark, a popular big data processing engine.
Business Analyst
Business Analysts analyze business processes and data to improve efficiency and effectiveness. As a Business Analyst, you will use your knowledge of Spark to analyze large datasets and identify trends and patterns. This course can help you become a Business Analyst because it provides hands-on experience with the latest version of Spark, a popular big data processing engine.
Quantitative Analyst
Quantitative Analysts use mathematical and statistical methods to analyze data, and are often employed in finance to help make investment decisions. As a Quantitative Analyst, you will use Spark to perform data analysis on large datasets, such as financial time series and market data. The course will provide you with a solid foundation in the latest version of Spark, which will make you a more competitive candidate for Quantitative Analyst roles.
Research Analyst
Research Analysts conduct research and provide insights on a variety of topics, including market trends, economic conditions, and industry analysis. As a Research Analyst, you will use Spark to analyze large datasets, such as consumer surveys and market research data. By taking this course, you will become familiar with the latest version of Spark, which will make you a more competitive candidate for Research Analyst roles.

Reading list

We've selected eight books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Getting Started with Spark 2.
This is the official book for Apache Spark, written by its creators. It provides a comprehensive overview of Spark's architecture and APIs, and covers a wide range of topics, from basic data manipulation to advanced machine learning and streaming applications.
The official reference guide for Spark 2, providing a comprehensive overview of its features and APIs. Suitable as a reference for both beginners and experienced users.
Deep dive into the internals of Spark. It covers a wide range of topics, including memory management, scheduling, and performance tuning. It must-read for anyone who wants to get the most out of Spark.
Covers the fundamentals of machine learning using Python, including supervised and unsupervised learning algorithms. Useful for learners who wish to apply machine learning techniques using Python.
Provides an introduction to deep learning using Python, including convolutional neural networks and recurrent neural networks. Useful for learners who wish to explore deep learning techniques for big data.
Covers the basics of Python for data analysis, including data manipulation, visualization, and machine learning. Useful for learners who are new to Python or data analysis.
Collection of patterns for using Spark for advanced analytics. It covers a wide range of topics, including machine learning, graph processing, and streaming analytics.
Comprehensive guide to using Spark. It covers a wide range of topics, including Spark's architecture, APIs, and programming models. It also provides extensive coverage of Spark's ecosystem of libraries.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Getting Started with Spark 2.
Building Machine Learning Models in Spark 2
Most relevant
Introduction to Big Data with Spark and Hadoop
Most relevant
Developing Spark Applications Using Scala & Cloudera
Most relevant
Conceptualizing the Processing Model for Apache Spark...
Most relevant
Apache Spark for Data Engineering and Machine Learning
Most relevant
Spark and Python for Big Data with PySpark
Most relevant
Distributed Computing with Spark SQL
Most relevant
Big Data, Hadoop, and Spark Basics
Most relevant
Apache Spark with Scala - Hands On with Big Data!
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser