We may earn an affiliate commission when you visit our partners.
Durga Viswanatha Raju Gadiraju, Naga Bhuwaneshwar, and Kavitha Penmetsa

As part of this course, you will be learning to build streaming pipelines by integrating Kafka and Spark Structured Streaming. Let us go through the details about what is covered in the course.

Read more

As part of this course, you will be learning to build streaming pipelines by integrating Kafka and Spark Structured Streaming. Let us go through the details about what is covered in the course.

  • First of all, we need to have the proper environment to build streaming pipelines using Kafka and Spark Structured Streaming on top of Hadoop or any other distributed file system. As part of the course, you will start with setting up a self-support lab with all the key components such as Hadoop, Hive, Spark, and Kafka on a single node Linux-based system.

  • Once the environment is set up you will go through the details related to getting started with Kafka. As part of that process, you will create a Kafka topic, produce messages into the topic as well as consume messages from the topic.

  • You will also learn how to use Kafka Connect to ingest data from web server logs into Kafka topic as well as ingest data from Kafka topic into HDFS as a sink.

  • Once you understand Kafka from the perspective of Data Ingestion, you will get an overview of some of the key concepts of related Spark Structured Streaming.

  • After learning Kafka and Spark Structured streaming separately, you will build a streaming pipeline to consume data from Kafka topic using Spark Structured Streaming, then process and write to different targets.

  • You will also learn how to take care of incremental data processing using Spark Structured Streaming.

Course Outline

Here is a brief outline of the course. You can choose either Cloud9 or GCP to provision a server to set up the environment.

  • Setting up Environment using AWS Cloud9 or GCP

  • Setup Single Node Hadoop Cluster

  • Setup Hive and Spark on top of Single Node Hadoop Cluster

  • Setup Single Node Kafka Cluster on top of Single Node Hadoop Cluster

  • Getting Started with Kafka

  • Data Ingestion using Kafka Connect - Web server log files as a source to Kafka Topic

  • Data Ingestion using Kafka Connect - Kafka Topic to HDFS a sink

  • Overview of Spark Structured Streaming

  • Kafka and Spark Structured Streaming Integration

  • Incremental Loads using Spark Structured Streaming

Udemy based support

In case you run into technical challenges while taking the course, feel free to raise your concerns using Udemy Messenger. We will make sure that issue is resolved in 48 hours.

Enroll now

Here's a deal for you

We found an offer that may be relevant to this course.
Save money when you learn. All coupon codes, vouchers, and discounts are applied automatically unless otherwise noted.

What's inside

Learning objectives

  • Setting up self support lab with hadoop (hdfs and yarn), hive, spark, and kafka
  • Overview of kafka to build streaming pipelines
  • Data ingestion to kafka topics using kafka connect using file source
  • Data ingestion to hdfs using kafka connect using hdfs 3 connector plugin
  • Overview of spark structured streaming to process data as part of streaming pipelines
  • Incremental data processing using spark structured streaming using file source and file target
  • Integration of kafka and spark structured streaming - reading data from kafka topics

Syllabus

Introduction
Introduction to Data Engineering using Kafka and Spark Structured Streaming
Important Note for first time Data Engineering Customers
Read more

Traffic lights

Read about what's good
what should give you pause
and possible dealbreakers
Provides comprehensive knowledge and hands-on experience for building streaming pipelines with Kafka and Spark Structured Streaming
Covers real-world use cases, such as data ingestion using Kafka Connect and incremental data processing
Instructors have industry experience and are recognized for their expertise in Kafka and Spark
Requires a setup lab with Hadoop, Hive, Spark, and Kafka, which may involve additional effort

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Data Engineering using Kafka and Spark Structured Streaming with these activities:
Organize your course materials
Helps you stay organized and focused by keeping your course materials in one place.
Show steps
  • Create a folder for your course materials.
  • Download and save all of your course materials (e.g., slides, assignments, readings).
  • Organize your materials into subfolders (e.g., by week, by topic).
Read "Kafka: The Definitive Guide"
Provides a deep dive into the concepts and practices of Kafka.
Show steps
  • Obtain a copy of the book.
  • Read the book.
  • Take notes and highlight important passages.
Follow a tutorial on Kafka and Spark Streaming
Provides step-by-step instructions on how to use Kafka and Spark Streaming.
Show steps
  • Find a tutorial on Kafka and Spark Streaming.
  • Follow the steps in the tutorial.
  • Complete the tutorial.
Four other activities
Expand to see all activities and additional details
Show all seven activities
Solve Kafka and Spark Streaming practice problems
Helps you solidify your understanding of Kafka and Spark Streaming concepts by solving practice problems.
Browse courses on Kafka
Show steps
  • Find practice problems on Kafka and Spark Streaming.
  • Solve the practice problems.
  • Check your solutions against the provided answer key.
Attend a workshop on Kafka and Spark Streaming
Provides an opportunity to learn from experts and network with other professionals in the field.
Show steps
  • Find a workshop on Kafka and Spark Streaming.
  • Register for the workshop.
  • Attend the workshop.
Write a Spark Structured Streaming application
Reinforces the concepts of Spark Structured Streaming by having you apply them to a practical application.
Show steps
  • Create a Spark StreamingContext.
  • Define a data source for your streaming application.
  • Transform the data using Spark operations.
  • Define a data sink for your streaming application.
  • Submit your streaming application for execution.
Build a data pipeline using Kafka and Spark Streaming
Provides you with an opportunity to apply the skills you've learned in the course to a real-world scenario.
Browse courses on Data Pipelines
Show steps
  • Design your data pipeline architecture.
  • Implement your data pipeline using Kafka and Spark Streaming.
  • Test and deploy your data pipeline.
  • Monitor and maintain your data pipeline.

Career center

Learners who complete Data Engineering using Kafka and Spark Structured Streaming will develop knowledge and skills that may be useful to these careers:
Data Engineer
As a Data Engineer, your primary focus is to design, build, and maintain complex data pipelines. Data Engineers are in high demand across a variety of industries due to the increasing amount of data that businesses collect today. Their responsibilities primarily involve building, testing, and deploying data management systems and data pipelines to move and transform data across a variety of data sources and targets. This course may be useful for someone who wishes to become or advance their career as a data engineer, as it will provide a foundation in using Kafka, Spark Structured Streaming, and Hadoop to perform data ingestion, processing, and analysis. These skills are in high demand in the field of data engineering.
Data Analyst
Data Analysts play a vital role in understanding and communicating data insights. Their day-to-day work typically includes working with large datasets, conducting statistical analysis, interpreting results, and communicating these insights to stakeholders in order to inform decision-making. This course may be useful for data analysts who wish to expand their skillset and transition into a role as a data engineer, which combines data analysis with data engineering. This course will help data analysts gain valuable experience in using Kafka, Spark Structured Streaming, and Hadoop to ingest and process data. Data analysts who are interested in working with big data may find this course particularly useful.
Software Engineer
Software Engineers apply engineering principles to the design, development, deployment, and maintenance of software systems. They work on a variety of projects throughout their career, and some focus on specific domains such as data engineering, computer science, or web development. This course may be useful for Software Engineers who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Software Engineer more competitive in the job market.
Data Scientist
Data Scientists use scientific methods, processes, algorithms, and systems to extract knowledge and insights from data. They may work on a variety of projects throughout their career, and some focus on specific domains such as NLP, computer vision, or speech recognition. This course may be useful for Data Scientists who want to improve their skills in data engineering. The skills learned in this course will help Data Scientists build data pipelines and perform data processing tasks more efficiently.
Business Intelligence Analyst
Business Intelligence Analysts focus on using data to make better business decisions. They collect, analyze, interpret, and present data in order to help businesses understand their performance and make better decisions. This course may be useful for Business Intelligence Analysts who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Business Intelligence Analyst more competitive in the job market.
Database Administrator
Database Administrators are responsible for the installation, configuration, maintenance, and performance of database systems. Their duties include designing, implementing, and managing databases, as well as ensuring that data is secure and accessible. This course may be helpful for someone who wishes to become or advance their career as a database administrator, as it will provide a foundation in using Kafka, Spark Structured Streaming, and Hadoop to work with data at scale. These skills are in high demand in the field of database administration.
Cloud Engineer
Cloud Engineers focus on designing, building, and maintaining cloud-based systems. They work on a variety of projects throughout their career, and some focus on specific domains such as data engineering, cloud computing, or networking. This course may be useful for Cloud Engineers who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Cloud Engineer more competitive in the job market.
DevOps Engineer
DevOps Engineers focus on bridging the gap between development and operations teams. They work on a variety of projects throughout their career, and some focus on specific domains such as data engineering, software development, or cloud computing. This course may be useful for DevOps Engineers who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a DevOps Engineer more competitive in the job market.
Machine Learning Engineer
Machine Learning Engineers focus on designing, building, and maintaining machine learning models. They work on a variety of projects throughout their career, and some focus on specific domains such as data engineering, machine learning, or artificial intelligence. This course may be useful for Machine Learning Engineers who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Machine Learning Engineer more competitive in the job market.
Data Architect
Data Architects focus on designing and managing data systems. Their responsibilities include designing data models, developing data management strategies, and ensuring that data is accessible and secure. This course may be useful for Data Architects who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Data Architect more competitive in the job market.
Data Governance Analyst
Data Governance Analysts focus on developing and implementing data governance policies and procedures. Their responsibilities include ensuring that data is used in a consistent and ethical manner, and that data is protected from unauthorized access. This course may be useful for Data Governance Analysts who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Data Governance Analyst more competitive in the job market.
Information Security Analyst
Information Security Analysts focus on protecting data and information systems from unauthorized access, use, disclosure, disruption, modification, or destruction. Their responsibilities include developing and implementing security policies and procedures, and monitoring for and responding to security breaches. This course may be useful for Information Security Analysts who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make an Information Security Analyst more competitive in the job market.
Quality Assurance Analyst
Quality Assurance Analysts focus on testing and validating software applications to ensure that they meet quality standards. Their responsibilities include developing and executing test plans, and reporting on the results of testing. This course may be useful for Quality Assurance Analysts who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Quality Assurance Analyst more competitive in the job market.
Technical Writer
Technical Writers focus on creating user manuals, technical documentation, and other written materials. Their responsibilities include gathering and organizing technical information, and writing clear and concise documentation. This course may be useful for Technical Writers who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Technical Writer more competitive in the job market.
Product Manager
Product Managers focus on defining the vision, roadmap, and features of a product. Their responsibilities include working with engineers, designers, and other stakeholders to bring a product to market. This course may be useful for Product Managers who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Product Manager more competitive in the job market.

Reading list

We've selected seven books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Data Engineering using Kafka and Spark Structured Streaming.
Is essential reading for anyone who wants to learn about how Kafka works, and how to use it to build and operate real-world streaming applications. It covers everything from the basics of Kafka's architecture and APIs to advanced topics such as performance tuning and security.
Comprehensive guide to using Spark, a popular library for large-scale data processing. It covers everything from the basics of Spark to advanced topics such as machine learning and graph processing.
Comprehensive guide to using Spark, a popular library for large-scale data processing. It covers everything from the basics of Spark to advanced topics such as machine learning and graph processing.
Practical guide to building and operating streaming data pipelines using Apache Spark. It covers everything from the basics of Spark to advanced topics such as performance tuning and security.
Provides a comprehensive overview of big data analytics, including its concepts, tools, and techniques. It good choice for anyone who wants to learn more about big data analytics and how to use it for data-driven decision-making.
Provides a practical introduction to machine learning with Apache Spark, including its algorithms, techniques, and use cases. It good choice for anyone who wants to learn how to use Spark for machine learning.
Provides a comprehensive overview of Apache Hadoop, including its architecture, components, and use cases. It good choice for anyone who wants to learn more about Hadoop and how to use it for data storage and processing.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Similar courses are unavailable at this time. Please try again later.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser