Data Engineering using Kafka and Spark Structured Streaming from Udemy

As part of this course, you will be learning to build streaming pipelines by integrating Kafka and Spark Structured Streaming. Let us go through the details about what is covered in the course.

First of all, we need to have the proper environment to build streaming pipelines using Kafka and Spark Structured Streaming on top of Hadoop or any other distributed file system. As part of the course, you will start with setting up a self-support lab with all the key components such as Hadoop, Hive, Spark, and Kafka on a single node Linux-based system.
Once the environment is set up you will go through the details related to getting started with Kafka. As part of that process, you will create a Kafka topic, produce messages into the topic as well as consume messages from the topic.
You will also learn how to use Kafka Connect to ingest data from web server logs into Kafka topic as well as ingest data from Kafka topic into HDFS as a sink.
Once you understand Kafka from the perspective of Data Ingestion, you will get an overview of some of the key concepts of related Spark Structured Streaming.
After learning Kafka and Spark Structured streaming separately, you will build a streaming pipeline to consume data from Kafka topic using Spark Structured Streaming, then process and write to different targets.
You will also learn how to take care of incremental data processing using Spark Structured Streaming.

Course Outline

Here is a brief outline of the course. You can choose either Cloud9 or GCP to provision a server to set up the environment.

Setting up Environment using AWS Cloud9 or GCP
Setup Single Node Hadoop Cluster
Setup Hive and Spark on top of Single Node Hadoop Cluster
Setup Single Node Kafka Cluster on top of Single Node Hadoop Cluster
Getting Started with Kafka
Data Ingestion using Kafka Connect - Web server log files as a source to Kafka Topic
Data Ingestion using Kafka Connect - Kafka Topic to HDFS a sink
Overview of Spark Structured Streaming
Kafka and Spark Structured Streaming Integration
Incremental Loads using Spark Structured Streaming

Udemy based support

In case you run into technical challenges while taking the course, feel free to raise your concerns using Udemy Messenger. We will make sure that issue is resolved in 48 hours.

What's inside

Learning objectives

Setting up self support lab with hadoop (hdfs and yarn), hive, spark, and kafka
Overview of kafka to build streaming pipelines
Data ingestion to kafka topics using kafka connect using file source
Data ingestion to hdfs using kafka connect using hdfs 3 connector plugin

Overview of spark structured streaming to process data as part of streaming pipelines
Incremental data processing using spark structured streaming using file source and file target
Integration of kafka and spark structured streaming - reading data from kafka topics

Setting up self support lab with hadoop (hdfs and yarn), hive, spark, and kafka
Overview of kafka to build streaming pipelines
Data ingestion to kafka topics using kafka connect using file source
Data ingestion to hdfs using kafka connect using hdfs 3 connector plugin
Overview of spark structured streaming to process data as part of streaming pipelines
Incremental data processing using spark structured streaming using file source and file target
Integration of kafka and spark structured streaming - reading data from kafka topics

Syllabus

Introduction

Introduction to Data Engineering using Kafka and Spark Structured Streaming

Important Note for first time Data Engineering Customers

Traffic lights

Read about what's good

what should give you pause

and possible dealbreakers

Provides comprehensive knowledge and hands-on experience for building streaming pipelines with Kafka and Spark Structured Streaming

Covers real-world use cases, such as data ingestion using Kafka Connect and incremental data processing

Instructors have industry experience and are recognized for their expertise in Kafka and Spark

Requires a setup lab with Hadoop, Hive, Spark, and Kafka, which may involve additional effort

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Data Engineering using Kafka and Spark Structured Streaming with these activities:

Organize your course materials

Show steps

Helps you stay organized and focused by keeping your course materials in one place.

Show steps

Create a folder for your course materials.
Download and save all of your course materials (e.g., slides, assignments, readings).
Organize your materials into subfolders (e.g., by week, by topic).

Read "Kafka: The Definitive Guide"

Show steps

Provides a deep dive into the concepts and practices of Kafka.

View Kafka: The Definitive Guide: Real-Time Data and... on Amazon

Show steps

Obtain a copy of the book.
Read the book.
Take notes and highlight important passages.

Follow a tutorial on Kafka and Spark Streaming

Show steps

Provides step-by-step instructions on how to use Kafka and Spark Streaming.

Show steps

Find a tutorial on Kafka and Spark Streaming.
Follow the steps in the tutorial.
Complete the tutorial.

Four other activities

Expand to see all activities and additional details

Show all seven activities

Solve Kafka and Spark Streaming practice problems

Show steps

Helps you solidify your understanding of Kafka and Spark Streaming concepts by solving practice problems.

Browse courses on Kafka

Show steps

Find practice problems on Kafka and Spark Streaming.
Solve the practice problems.
Check your solutions against the provided answer key.

Attend a workshop on Kafka and Spark Streaming

Show steps

Provides an opportunity to learn from experts and network with other professionals in the field.

Show steps

Find a workshop on Kafka and Spark Streaming.
Register for the workshop.
Attend the workshop.

Write a Spark Structured Streaming application

Show steps

Reinforces the concepts of Spark Structured Streaming by having you apply them to a practical application.

Browse courses on Spark Structured Streaming

Show steps

Create a Spark StreamingContext.
Define a data source for your streaming application.
Transform the data using Spark operations.
Define a data sink for your streaming application.
Submit your streaming application for execution.

Build a data pipeline using Kafka and Spark Streaming

Show steps

Provides you with an opportunity to apply the skills you've learned in the course to a real-world scenario.

Browse courses on Data Pipelines

Show steps

Design your data pipeline architecture.
Implement your data pipeline using Kafka and Spark Streaming.
Test and deploy your data pipeline.
Monitor and maintain your data pipeline.

Career center

Learners who complete Data Engineering using Kafka and Spark Structured Streaming will develop knowledge and skills that may be useful to these careers:

Data Engineer

As a Data Engineer, your primary focus is to design, build, and maintain complex data pipelines. Data Engineers are in high demand across a variety of industries due to the increasing amount of data that businesses collect today. Their responsibilities primarily involve building, testing, and deploying data management systems and data pipelines to move and transform data across a variety of data sources and targets. This course may be useful for someone who wishes to become or advance their career as a data engineer, as it will provide a foundation in using Kafka, Spark Structured Streaming, and Hadoop to perform data ingestion, processing, and analysis. These skills are in high demand in the field of data engineering.

See salaries and explore the career path for Data Engineer

Data Analyst

Data Analysts play a vital role in understanding and communicating data insights. Their day-to-day work typically includes working with large datasets, conducting statistical analysis, interpreting results, and communicating these insights to stakeholders in order to inform decision-making. This course may be useful for data analysts who wish to expand their skillset and transition into a role as a data engineer, which combines data analysis with data engineering. This course will help data analysts gain valuable experience in using Kafka, Spark Structured Streaming, and Hadoop to ingest and process data. Data analysts who are interested in working with big data may find this course particularly useful.

See salaries and explore the career path for Data Analyst

Software Engineer

Software Engineers apply engineering principles to the design, development, deployment, and maintenance of software systems. They work on a variety of projects throughout their career, and some focus on specific domains such as data engineering, computer science, or web development. This course may be useful for Software Engineers who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Software Engineer more competitive in the job market.

See salaries and explore the career path for Software Engineer

Data Scientist

Data Scientists use scientific methods, processes, algorithms, and systems to extract knowledge and insights from data. They may work on a variety of projects throughout their career, and some focus on specific domains such as NLP, computer vision, or speech recognition. This course may be useful for Data Scientists who want to improve their skills in data engineering. The skills learned in this course will help Data Scientists build data pipelines and perform data processing tasks more efficiently.

See salaries and explore the career path for Data Scientist

Business Intelligence Analyst

Business Intelligence Analysts focus on using data to make better business decisions. They collect, analyze, interpret, and present data in order to help businesses understand their performance and make better decisions. This course may be useful for Business Intelligence Analysts who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Business Intelligence Analyst more competitive in the job market.

See salaries and explore the career path for Business Intelligence Analyst

Database Administrator

Database Administrators are responsible for the installation, configuration, maintenance, and performance of database systems. Their duties include designing, implementing, and managing databases, as well as ensuring that data is secure and accessible. This course may be helpful for someone who wishes to become or advance their career as a database administrator, as it will provide a foundation in using Kafka, Spark Structured Streaming, and Hadoop to work with data at scale. These skills are in high demand in the field of database administration.

See salaries and explore the career path for Database Administrator

Cloud Engineer

Cloud Engineers focus on designing, building, and maintaining cloud-based systems. They work on a variety of projects throughout their career, and some focus on specific domains such as data engineering, cloud computing, or networking. This course may be useful for Cloud Engineers who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Cloud Engineer more competitive in the job market.

See salaries and explore the career path for Cloud Engineer

DevOps Engineer

DevOps Engineers focus on bridging the gap between development and operations teams. They work on a variety of projects throughout their career, and some focus on specific domains such as data engineering, software development, or cloud computing. This course may be useful for DevOps Engineers who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a DevOps Engineer more competitive in the job market.

See salaries and explore the career path for DevOps Engineer

Machine Learning Engineer

Machine Learning Engineers focus on designing, building, and maintaining machine learning models. They work on a variety of projects throughout their career, and some focus on specific domains such as data engineering, machine learning, or artificial intelligence. This course may be useful for Machine Learning Engineers who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Machine Learning Engineer more competitive in the job market.

See salaries and explore the career path for Machine Learning Engineer

Data Architect

Data Architects focus on designing and managing data systems. Their responsibilities include designing data models, developing data management strategies, and ensuring that data is accessible and secure. This course may be useful for Data Architects who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Data Architect more competitive in the job market.

See salaries and explore the career path for Data Architect

Data Governance Analyst

Data Governance Analysts focus on developing and implementing data governance policies and procedures. Their responsibilities include ensuring that data is used in a consistent and ethical manner, and that data is protected from unauthorized access. This course may be useful for Data Governance Analysts who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Data Governance Analyst more competitive in the job market.

See salaries and explore the career path for Data Governance Analyst

Information Security Analyst

Information Security Analysts focus on protecting data and information systems from unauthorized access, use, disclosure, disruption, modification, or destruction. Their responsibilities include developing and implementing security policies and procedures, and monitoring for and responding to security breaches. This course may be useful for Information Security Analysts who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make an Information Security Analyst more competitive in the job market.

See salaries and explore the career path for Information Security Analyst

Quality Assurance Analyst

Quality Assurance Analysts focus on testing and validating software applications to ensure that they meet quality standards. Their responsibilities include developing and executing test plans, and reporting on the results of testing. This course may be useful for Quality Assurance Analysts who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Quality Assurance Analyst more competitive in the job market.

See salaries and explore the career path for Quality Assurance Analyst

Technical Writer

Technical Writers focus on creating user manuals, technical documentation, and other written materials. Their responsibilities include gathering and organizing technical information, and writing clear and concise documentation. This course may be useful for Technical Writers who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Technical Writer more competitive in the job market.

See salaries and explore the career path for Technical Writer

Product Manager

Product Managers focus on defining the vision, roadmap, and features of a product. Their responsibilities include working with engineers, designers, and other stakeholders to bring a product to market. This course may be useful for Product Managers who are interested in learning more about data engineering. Kafka, Spark Structured Streaming, and Hadoop are popular tools in the field of data engineering, and gaining proficiency in these tools will make a Product Manager more competitive in the job market.

See salaries and explore the career path for Product Manager