As part of this course, you will be learning to build streaming pipelines by integrating Kafka and Spark Structured Streaming. Let us go through the details about what is covered in the course.
As part of this course, you will be learning to build streaming pipelines by integrating Kafka and Spark Structured Streaming. Let us go through the details about what is covered in the course.
First of all, we need to have the proper environment to build streaming pipelines using Kafka and Spark Structured Streaming on top of Hadoop or any other distributed file system. As part of the course, you will start with setting up a self-support lab with all the key components such as Hadoop, Hive, Spark, and Kafka on a single node Linux-based system.
Once the environment is set up you will go through the details related to getting started with Kafka. As part of that process, you will create a Kafka topic, produce messages into the topic as well as consume messages from the topic.
You will also learn how to use Kafka Connect to ingest data from web server logs into Kafka topic as well as ingest data from Kafka topic into HDFS as a sink.
Once you understand Kafka from the perspective of Data Ingestion, you will get an overview of some of the key concepts of related Spark Structured Streaming.
After learning Kafka and Spark Structured streaming separately, you will build a streaming pipeline to consume data from Kafka topic using Spark Structured Streaming, then process and write to different targets.
You will also learn how to take care of incremental data processing using Spark Structured Streaming.
Course Outline
Here is a brief outline of the course. You can choose either Cloud9 or GCP to provision a server to set up the environment.
Setting up Environment using AWS Cloud9 or GCP
Setup Single Node Hadoop Cluster
Setup Hive and Spark on top of Single Node Hadoop Cluster
Setup Single Node Kafka Cluster on top of Single Node Hadoop Cluster
Getting Started with Kafka
Data Ingestion using Kafka Connect - Web server log files as a source to Kafka Topic
Data Ingestion using Kafka Connect - Kafka Topic to HDFS a sink
Overview of Spark Structured Streaming
Kafka and Spark Structured Streaming Integration
Incremental Loads using Spark Structured Streaming
Udemy based support
In case you run into technical challenges while taking the course, feel free to raise your concerns using Udemy Messenger. We will make sure that issue is resolved in 48 hours.
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.