Save for later

Conceptualizing the Processing Model for Apache Spark Structured Streaming

Structured Streaming in Spark 2 is a unified model that treats batch as a prefix of stream. This allows Spark to perform the same operations on streaming data as on batch data, and Spark takes care of the details involved in incrementalizing the batch operation to work on streams. In this course, Conceptualizing the Processing Model for Apache Spark Structured Streaming, you will use the DataFrame API as well as Spark SQL to run queries on streaming sources and write results out to data sinks. First, you will be introduced to streaming DataFrames in Spark 2 and understand how structured streaming in Spark 2 is different from Spark Streaming available in earlier versions of Spark. You will also get a high level understanding of how Spark’s architecture works, and the role of drivers, workers, executors, and tasks. Next, you will execute queries on streaming data from a socket source as well as a file system source. You will perform basic operations on streaming data using Data frames and register your data as a temporary view to run SQL queries on input streams. You will explore the append, complete, and update modes to write data out to sinks. You will then understand how scheduling and checkpointing works in Spark and explore the differences between the micro-batch mode of execution and the new experimental continuous processing mode that Spark offers. Finally, you will discuss the Tungsten engine optimizations which make Spark 2 so much faster than Spark 1, and discuss the stages of optimization in the Catalyst optimizer which works with SQL queries. At the end of this course, you will be able to build and execute streaming queries on input data, write these out to reliable storage using different output modes, and checkpoint your streaming applications for fault tolerance and recovery.

Get Details and Enroll Now

OpenCourser is an affiliate partner of Pluralsight and may earn a commission when you buy through our links.

Get a Reminder

Send to:
Rating Not enough ratings
Length 3.0 hours
Starts On Demand (Start anytime)
Cost $35/month (Access to entire library- free trial available)
From Pluralsight
Instructor Janani Ravi
Download Videos On Windows, MacOS, iOS, and Android Pluralsight app
Language English
Subjects IT & Networking
Tags Data Professional

Get a Reminder

Send to:

Similar Courses

Careers

An overview of related careers and their average salaries in the US. Bars indicate income percentile.

Streaming Data Developer/Architect $59k

Big Data Developer (Streaming Data) $77k

Structured Finance Coordinator, Structured Finance $79k

Structured Finance Analyst 2 $81k

Streaming Support Producer $84k

Structured Credit Analyst $87k

FVP Structured Finance $105k

Structured Product Analyst $115k

Project Manager, Streaming Media $119k

Streaming Content Producer $131k

Android Streaming Engineer $135k

Senior Streaming Content Producer $174k

Write a review

Your opinion matters. Tell us what you think.

Rating Not enough ratings
Length 3.0 hours
Starts On Demand (Start anytime)
Cost $35/month (Access to entire library- free trial available)
From Pluralsight
Instructor Janani Ravi
Download Videos On Windows, MacOS, iOS, and Android Pluralsight app
Language English
Subjects IT & Networking
Tags Data Professional

Similar Courses

Sorted by relevance

Like this course?

Here's what to do next:

  • Save this course for later
  • Get more details from the course provider
  • Enroll in this course
Enroll Now