Exploring the Apache Beam SDK for Modeling Streaming Data for Processing from Pluralsight

Apache Beam is an open-source unified model for processing batch and streaming data in a parallel manner. Built to support Google’s Cloud Dataflow backend, Beam pipelines can now be executed on any supported distributed processing backends.

Apache Beam SDKs can represent and process both finite and infinite datasets using the same programming model. All data processing tasks are defined using a Beam pipeline and are represented as directed acyclic graphs. These pipelines can then be executed on multiple execution backends such as Google Cloud Dataflow, Apache Flink, and Apache Spark.

In this course, Exploring the Apache Beam SDK for Modeling Streaming Data for Processing, we will explore Beam APIs for defining pipelines, executing transforms, and performing windowing and join operations.

First, you will understand and work with the basic components of a Beam pipeline, PCollections, and PTransforms. You will work with PCollections holding different kinds of elements and see how you can specify the schema for PCollection elements. You will then configure these pipelines using custom options and execute them on backends such as Apache Flink and Apache Spark.

Next, you will explore the different kinds of core transforms that you can apply to streaming data for processing. This includes the ParDo and DoFns, GroupByKey, CoGroupByKey for join operations and the Flatten and Partition transforms.

You will then see how you can perform windowing operations on input streams and apply fixed windows, sliding windows, session windows, and global windows to your streaming data. You will use the join extension library to perform inner and outer joins on datasets.

Finally, you will configure metrics that you want tracked during pipeline execution including counter metrics, distribution metrics, and gauge metrics, and then round this course off by executing SQL queries on input data.

When you are finished with this course you will have the skills and knowledge to perform a wide range of data processing tasks using core Beam transforms and will be able to track metrics and run SQL queries on input streams.

What's inside

Syllabus

Course Overview

Understanding Pipelines, PCollections, and PTransforms

Executing Pipelines to Process Streaming Data

Applying Transformations to Streaming Data

Working with Windowing and Join Operations

Perform SQL Queries on Streaming Data

Good to know

Know what's good

, what to watch for

, and possible dealbreakers

Teaches Apache Beam SDKs, which are in demand with employers

Taught by Janani Ravi, a well-known expert in Apache Beam

Develops basic to advanced skills in Apache Beam SDKs

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Exploring the Apache Beam SDK for Modeling Streaming Data for Processing with these activities:

Review Streaming Data Concepts

Show steps

Steps 1 and 2 will help you prepare for new concepts introduced in this course.

Browse courses on Streaming Data Processing

Show steps

Revisit the basics of data streams, stream processing, and event-driven architectures.
Review finite and infinite datasets and how they relate to batch and streaming processing.

Show all one activities

Career center

Learners who complete Exploring the Apache Beam SDK for Modeling Streaming Data for Processing will develop knowledge and skills that may be useful to these careers:

Data Engineer

The Apache Beam SDK is an open-source, unified model for processing both batch and streaming data in a parallel manner. Data Engineers use the Beam SDK to define pipelines, execute transforms, and perform windowing and join operations on streaming data. This course covers the basics of the Beam SDK, including how to create and execute pipelines, apply transformations to streaming data, and perform windowing and join operations. By taking this course, Data Engineers can gain the skills and knowledge needed to use the Apache Beam SDK to process streaming data in a variety of applications.

See salaries and explore the career path for Data Engineer

Software Engineer

Software Engineers use the Apache Beam SDK to develop data processing pipelines for a variety of applications. This course covers the basics of the Beam SDK, including how to create and execute pipelines, apply transformations to streaming data, and perform windowing and join operations. By taking this course, Software Engineers can gain the skills and knowledge needed to use the Apache Beam SDK to build data processing pipelines that can handle streaming data.

See salaries and explore the career path for Software Engineer

Data Scientist

Data Scientists use the Apache Beam SDK to develop data processing pipelines for a variety of applications. This course covers the basics of the Beam SDK, including how to create and execute pipelines, apply transformations to streaming data, and perform windowing and join operations. By taking this course, Data Scientists can gain the skills and knowledge needed to use the Apache Beam SDK to build data processing pipelines that can handle streaming data.

See salaries and explore the career path for Data Scientist

Data Analyst

Data Analysts use the Apache Beam SDK to develop data processing pipelines for a variety of applications. This course covers the basics of the Beam SDK, including how to create and execute pipelines, apply transformations to streaming data, and perform windowing and join operations. By taking this course, Data Analysts can gain the skills and knowledge needed to use the Apache Beam SDK to build data processing pipelines that can handle streaming data.

See salaries and explore the career path for Data Analyst

Big Data Engineer

Big Data Engineers use the Apache Beam SDK to develop data processing pipelines for a variety of applications. This course covers the basics of the Beam SDK, including how to create and execute pipelines, apply transformations to streaming data, and perform windowing and join operations. By taking this course, Big Data Engineers can gain the skills and knowledge needed to use the Apache Beam SDK to build data processing pipelines that can handle streaming data.

See salaries and explore the career path for Big Data Engineer

Cloud Architect

Cloud Architects use the Apache Beam SDK to develop data processing pipelines for a variety of applications. This course covers the basics of the Beam SDK, including how to create and execute pipelines, apply transformations to streaming data, and perform windowing and join operations. By taking this course, Cloud Architects can gain the skills and knowledge needed to use the Apache Beam SDK to build data processing pipelines that can handle streaming data.

See salaries and explore the career path for Cloud Architect

Machine Learning Engineer

Machine Learning Engineers use the Apache Beam SDK to develop data processing pipelines for a variety of applications. This course covers the basics of the Beam SDK, including how to create and execute pipelines, apply transformations to streaming data, and perform windowing and join operations. By taking this course, Machine Learning Engineers can gain the skills and knowledge needed to use the Apache Beam SDK to build data processing pipelines that can handle streaming data.

See salaries and explore the career path for Machine Learning Engineer

Data Pipeline Engineer

Data Pipeline Engineers use the Apache Beam SDK to develop data processing pipelines for a variety of applications. This course covers the basics of the Beam SDK, including how to create and execute pipelines, apply transformations to streaming data, and perform windowing and join operations. By taking this course, Data Pipeline Engineers can gain the skills and knowledge needed to use the Apache Beam SDK to build data processing pipelines that can handle streaming data.

See salaries and explore the career path for Data Pipeline Engineer

Database Administrator

Database Administrators use the Apache Beam SDK to develop data processing pipelines for a variety of applications. This course covers the basics of the Beam SDK, including how to create and execute pipelines, apply transformations to streaming data, and perform windowing and join operations. By taking this course, Database Administrators can gain the skills and knowledge needed to use the Apache Beam SDK to build data processing pipelines that can handle streaming data.

See salaries and explore the career path for Database Administrator

Business Analyst

Business Analysts use the Apache Beam SDK to develop data processing pipelines for a variety of applications. This course covers the basics of the Beam SDK, including how to create and execute pipelines, apply transformations to streaming data, and perform windowing and join operations. By taking this course, Business Analysts can gain the skills and knowledge needed to use the Apache Beam SDK to build data processing pipelines that can handle streaming data.

See salaries and explore the career path for Business Analyst

Project Manager

Project Managers may use the Apache Beam SDK to develop data processing pipelines for a variety of applications. This course covers the basics of the Beam SDK, including how to create and execute pipelines, apply transformations to streaming data, and perform windowing and join operations. By taking this course, Project Managers can gain the skills and knowledge needed to use the Apache Beam SDK to build data processing pipelines that can handle streaming data.

See salaries and explore the career path for Project Manager

Software Architect

Software Architects may use the Apache Beam SDK to develop data processing pipelines for a variety of applications. This course covers the basics of the Beam SDK, including how to create and execute pipelines, apply transformations to streaming data, and perform windowing and join operations. By taking this course, Software Architects can gain the skills and knowledge needed to use the Apache Beam SDK to build data processing pipelines that can handle streaming data.

See salaries and explore the career path for Software Architect

Operations Research Analyst

Operations Research Analysts may use the Apache Beam SDK to develop data processing pipelines for a variety of applications. This course covers the basics of the Beam SDK, including how to create and execute pipelines, apply transformations to streaming data, and perform windowing and join operations. By taking this course, Operations Research Analysts can gain the skills and knowledge needed to use the Apache Beam SDK to build data processing pipelines that can handle streaming data.

See salaries and explore the career path for Operations Research Analyst

Data Warehouse Architect

Data Warehouse Architects may use the Apache Beam SDK to develop data processing pipelines for a variety of applications. This course covers the basics of the Beam SDK, including how to create and execute pipelines, apply transformations to streaming data, and perform windowing and join operations. By taking this course, Data Warehouse Architects can gain the skills and knowledge needed to use the Apache Beam SDK to build data processing pipelines that can handle streaming data.

See salaries and explore the career path for Data Warehouse Architect

Data Governance Analyst

Data Governance Analysts may use the Apache Beam SDK to develop data processing pipelines for a variety of applications. This course covers the basics of the Beam SDK, including how to create and execute pipelines, apply transformations to streaming data, and perform windowing and join operations. By taking this course, Data Governance Analysts can gain the skills and knowledge needed to use the Apache Beam SDK to build data processing pipelines that can handle streaming data.

See salaries and explore the career path for Data Governance Analyst