We may earn an affiliate commission when you visit our partners.
Justin Pihony

Apache Spark is a leader in enabling quick and efficient data processing. This course will teach you how to use Spark's SQL, Streaming, and even the newer Structured Streaming APIs to create applications able to handle data as it arrives.

Read more

Apache Spark is a leader in enabling quick and efficient data processing. This course will teach you how to use Spark's SQL, Streaming, and even the newer Structured Streaming APIs to create applications able to handle data as it arrives.

Analyzing data used to be something you did once a night. Now you need to be able to process data on the fly so you can provide up to the minute insights. But, how do you accomplish in real time what used to take hours without a complicated code base? In this course, Handling Fast Data with Apache Spark SQL and Streaming, you'll learn to use Apache Spark Streaming and SQL libraries as a great way to handle this new world of real time, fast data processing. First, you'll dive into SparkSQL. Next, you'll explore how to catch potential fraud by analyzing streams with Spark Streaming. Finally, you'll discover the newer Structured Streaming API. By the end of this course, you'll have a deeper understanding of these APIs, along with a number of streaming concepts that have driven the API design.

Enroll now

What's inside

Syllabus

Course Overview
Introduction
Querying Data with the DataFrames (Part 1)
Querying Data with the DataFrames (Part 2)
Read more
Improving Type Safety with Datasets
Processing Data with the Streaming API
Optimizing, Structured Streaming, and Spark 2.x

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Intended for learners with some experience in data querying and analytics, this course includes advanced topics like optimizing processing and structured streaming
Taught by Justin Pihony, a respected expert in big data analysis
Covers Spark SQL's dataframe API and Spark Streaming's structured streaming API
Examines both the theoretical underpinnings and practical applications of real-time data processing with Apache Spark
May be less accessible for learners without prior experience in data engineering or Apache Spark
Course materials are up-to-date, using Spark 2.x and its latest features

Save this course

Save Handling Fast Data with Apache Spark SQL and Streaming to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Handling Fast Data with Apache Spark SQL and Streaming with these activities:
Review Apache Spark Concepts
Refresh your understanding of Apache Spark's fundamental concepts before the course starts.
Browse courses on Apache Spark
Show steps
  • Revisit documentation or tutorials on Apache Spark
  • Review key concepts such as RDDs and transformations
Gather Resources on Spark Streaming
Compile a collection of useful resources on Spark Streaming to support your learning.
Browse courses on Spark Streaming
Show steps
  • Search for articles, tutorials, and documentation on Spark Streaming
  • Organize the resources into a structured collection
Practice Data Transformation in SparkSQL
Practice data transformation to reinforce your knowledge of SparkSQL's functionality.
Browse courses on SparkSQL
Show steps
  • Create a DataFrame from a CSV file
  • Select and rename columns
  • Filter data based on criteria
  • Join two DataFrames
Five other activities
Expand to see all activities and additional details
Show all eight activities
Follow Tutorials on Structured Streaming API
Expand your knowledge of Structured Streaming API through guided tutorials.
Show steps
  • Find a tutorial on Structured Streaming API
  • Work through the tutorial step-by-step
  • Implement a simple use case of Structured Streaming API
Join a Study Group for Apache Spark
Engage with peers in a study group to discuss concepts, share knowledge, and reinforce your understanding of Apache Spark.
Browse courses on Apache Spark
Show steps
  • Find or start a study group for Apache Spark
  • Actively participate in discussions and knowledge sharing
Build a Spark Streaming Application to Analyze Data Streams
Develop a real-world application of Spark Streaming to improve your understanding of the technology.
Browse courses on Spark Streaming
Show steps
  • Define the data source and streaming parameters
  • Process data streams with transformation and filtering
  • Set up a streaming visualization dashboard
Build a Real-Time Data Analytics Dashboard with Spark
Apply your knowledge to develop a practical project that combines Spark for data processing and a visualization tool for presenting insights.
Browse courses on Spark
Show steps
  • Design the dashboard and data pipelines
  • Implement data processing and analytics using Spark
  • Create visualizations and integrate them into the dashboard
Contribute to Spark Community Projects
Engage with the Apache Spark community by contributing to open-source projects and expanding your hands-on experience.
Browse courses on Spark
Show steps
  • Identify and choose a Spark community project to contribute to
  • Follow the project's guidelines and contribute code or documentation
  • Participate in discussions and get feedback from the community

Career center

Learners who complete Handling Fast Data with Apache Spark SQL and Streaming will develop knowledge and skills that may be useful to these careers:
Data Architect
Data Architects design and implement data architectures. This course can be useful as it provides hands-on experience with Apache Spark, a popular big data processing framework. Apache Spark is increasingly used by Data Architects to design and develop data-intensive systems, and proficiency in it can make you a more competitive candidate for Data Architect positions.
Big Data Architect
Big Data Architects design and implement big data systems. This course may be useful as it provides a foundation in Apache Spark, a popular big data processing framework. Apache Spark is increasingly used by Big Data Architects to design and develop data-intensive systems, and proficiency in it can make you a more competitive candidate for Big Data Architect positions.
Data Warehouse Architect
Data Warehouse Architects design and implement data warehouses. This course may be useful as it provides a foundation in Apache Spark, a popular big data processing framework. Apache Spark is increasingly used by Data Warehouse Architects to design and develop data-intensive systems, and proficiency in it can make you a more competitive candidate for Data Warehouse Architect positions.
Data Scientist
Data Scientists use statistical and machine learning techniques to extract insights from data. This course can be useful as it provides a foundation in Apache Spark, a popular big data processing framework. Apache Spark is increasingly used by Data Scientists to handle large volumes of data, and proficiency in it can make you a more competitive candidate for Data Scientist positions.
Machine Learning Engineer
Machine Learning Engineers design, build, and deploy machine learning models. This course can be useful as it provides hands-on experience with Apache Spark, a popular big data processing framework. Apache Spark is increasingly used by Machine Learning Engineers to handle large volumes of data, and proficiency in it can increase your chances of success in this field.
Enterprise Architect
Enterprise Architects design and implement enterprise-wide IT systems. This course can be useful as it provides hands-on experience with Apache Spark, a popular big data processing framework. Apache Spark is increasingly used by Enterprise Architects to design and develop data-intensive systems, and proficiency in it can make you a more competitive candidate for Enterprise Architect positions.
Data Engineer
Data Engineers are responsible for designing and building data pipelines, ensuring that data is clean, reliable, and accessible to analysts and other users. This course can be useful as it provides hands-on experience with Apache Spark, a popular big data processing framework. Apache Spark is widely used in data engineering roles, and proficiency in it can increase your chances of success in this field.
Data Analytics Manager
Data Analytics Managers oversee the collection, analysis, and interpretation of data. This course can be useful as it provides hands-on experience with Apache Spark, a popular big data processing framework. Apache Spark is increasingly used by Data Analytics Managers to handle large volumes of data, and proficiency in it can make you a more competitive candidate for Data Analytics Manager positions.
Business Analyst
Business Analysts use data to identify and solve business problems. This course can be useful as it provides hands-on experience with Apache Spark, a popular big data processing framework. Apache Spark is increasingly used by Business Analysts to handle large volumes of data, and proficiency in it can make you a more competitive candidate for Business Analyst positions.
Systems Analyst
Systems Analysts design and implement computer systems. This course can be useful as it provides hands-on experience with Apache Spark, a popular big data processing framework. Apache Spark is increasingly used by Systems Analysts to design and develop data-intensive systems, and proficiency in it can make you a more competitive candidate for Systems Analyst positions.
Database Administrator
Database Administrators are responsible for managing and maintaining databases. This course can be useful as it provides hands-on experience with Apache Spark, a popular big data processing framework. Apache Spark is increasingly used by Database Administrators to handle large volumes of data, and proficiency in it can make you a more competitive candidate for Database Administrator positions.
Software Architect
Software Architects design and develop software systems. This course can be useful as it provides hands-on experience with Apache Spark, a popular big data processing framework. Apache Spark is increasingly used by Software Architects to design and develop data-intensive systems, and proficiency in it can make you a more competitive candidate for Software Architect positions.
Software Development Manager
Software Development Managers oversee the development of software applications. This course can be useful as it provides hands-on experience with Apache Spark, a popular big data processing framework. Apache Spark is increasingly used by Software Development Managers to manage the development of data-intensive applications, and proficiency in it can make you a more competitive candidate for Software Development Manager positions.
Software Engineer
Software Engineers are responsible for analyzing user needs, designing, implementing, and testing software applications. This course can be useful as it provides a foundation in Apache Spark, a popular big data processing framework. Apache Spark is increasingly used by organizations to handle large volumes of data, and proficiency in it can make you a more competitive candidate for Software Engineer positions.
IT Manager
IT Managers are responsible for the planning, implementation, and management of an organization's IT systems. This course can be useful as it provides hands-on experience with Apache Spark, a popular big data processing framework. Apache Spark is increasingly used by IT Managers to manage data-intensive systems, and proficiency in it can make you a more competitive candidate for IT Manager positions.

Reading list

We've selected seven books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Handling Fast Data with Apache Spark SQL and Streaming.
Comprehensive guide to Apache Spark. It covers all aspects of Spark, from its core APIs to its latest features.
Fantastic introduction to Apache Spark, and it covers the basics of Spark SQL and streaming. It is also a great resource for learning about the Spark ecosystem, and it includes chapters on Spark MLlib, GraphX, and Structured Streaming.
Great resource for learning how to use Spark for big data analytics. It covers topics such as data ingestion, data processing, and data visualization.
Great resource for learning how to use Spark for data science. It covers topics such as data preparation, feature engineering, and model training.
Great resource for learning how to use Spark 3.0 with Java. It covers topics such as dataframes, datasets, SQL queries, and streaming.
Great resource for learning how to optimize and tune Spark applications. It covers topics such as data partitioning, caching, and garbage collection.
Great resource for beginners who are new to Apache Spark. It covers the basics of Spark SQL, streaming, and MLlib.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Handling Fast Data with Apache Spark SQL and Streaming.
Apache Spark Fundamentals
Most relevant
Apache Spark 3 Fundamentals
Most relevant
Big Data, Hadoop, and Spark Basics
Most relevant
Structured Streaming in Apache Spark 2
Most relevant
Applying the Lambda Architecture with Spark, Kafka, and...
Most relevant
Predictive Analytics Using Apache Spark MLlib on...
Most relevant
Data Engineering Essentials using SQL, Python, and PySpark
Most relevant
Conceptualizing the Processing Model for Apache Spark...
Most relevant
Processing Streaming Data Using Apache Spark Structured...
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser