We may earn an affiliate commission when you visit our partners.

Serverless Data Processing with Dataflow

Develop Pipelines

Google Cloud Training

Google Cloud Training

In this second installment of the Dataflow course series, we are going to be diving deeper on developing pipelines using the Beam SDK. We start with a review of Apache Beam concepts. Next, we discuss processing streaming data using windows, watermarks and triggers. We then cover options for sources and sinks in your pipelines, schemas to express your structured data, and how to do stateful transformations using State and Timer APIs. We move onto reviewing best practices that help maximize your pipeline performance. Towards the end of the course, we introduce SQL and Dataframes to represent your business logic in Beam and how to iteratively develop pipelines using Beam notebooks.

Or subscribe to Coursera Plus

And get unlimited access to Coursera

What's inside

Syllabus

Introduction

This module covers the course outline

Beam Concepts Review

Review main concepts of Apache Beam, and how to apply them to write your own data processing pipelines.

Read more

Traffic lights

Read about what's good

what should give you pause

and possible dealbreakers

Suitable for learners with knowledge of Apache Beam

Designed for intermediate learners who want to advance their Dataflow skills

Should be taken after the introductory Dataflow course

Could be part of a comprehensive curriculum on Apache Beam

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.

Save

Reviews summary

Dataflow pipeline development

Learners say Serverless Data Processing with Dataflow: Develop Pipelines is a good place to learn about Google Dataflow, Apache Beam, and data pipelines. Students particularly like the course's hands-on labs and engaging assignments. While some reviewers mention that the course is not trivial, many commend the course's focus on both batch and streaming pipelines. However, there are some complaints that the course is sometimes difficult to understand due to the instructors' speech patterns.

Students recommend this course to others.

"Good for an advance engineers"

"Good place to learn Dataflow and Apache Beam."

"Excellent course focus on Batch and Streaming Pipelines using Google Dataflow"

Students like the course's engaging, hands-on labs.

"Liked the hands-on labs."

"It is a good course but it is not trivial"

"This course gives a great overview of the basic building blocks of Apache Beam as well as offers an opportunity to get your hands dirty and use these building blocks to build real data pipelines."

Students sometimes found the instructors difficult to understand.

"people from India (not native english speakers) often speak illegibly and it is difficult (sometimes impossible) to understand them."

"subtitles don't help"

Many students found the material to be challenging.

"Too hard, insufficient signposting"

"it is difficult (sometimes impossible) to understand them"

"some code examples are in java only"

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Serverless Data Processing with Dataflow: Develop Pipelines with these activities:

Follow tutorials on Apache Beam's official website

Show steps

Enhance your understanding of Beam concepts by following tutorials on the official website, covering topics such as pipelines, transforms, and I/O.

Browse courses on Apache Beam

Show steps

Visit the Apache Beam website and explore the tutorials section
Select a tutorial relevant to your learning objectives
Follow the tutorial step-by-step and complete the exercises

Explore Beam examples on GitHub

Show steps

Gain practical insights by exploring real-world examples of Beam pipelines on GitHub, showcasing various use cases and implementation techniques.

Browse courses on Apache Beam

Show steps

Visit the Apache Beam GitHub repository
Browse the examples directory and select a relevant example
Review the example code and understand its functionality

Explore different sources and sinks

Show steps

Explore different sources and sinks to familiarize yourself with the various options available in Beam for reading and writing data.

Browse courses on Sources

Show steps

Create a pipeline with a Text IO source and a File IO sink
Try other sources and sinks such as BigQuery IO, PubSub IO, or Kafka IO
Experiment with different options and parameters for each source and sink

Four other activities

Expand to see all activities and additional details

Show all seven activities

Create and solve sample pipelines

Show steps

Practice creating and solving sample pipelines to reinforce your understanding of Beam concepts like windows, watermarks, and triggers.

Browse courses on Windows

Show steps

Create a pipeline skeleton
Add a source and sink
Configure windows, watermarks, and triggers
Run the pipeline and verify the results

Implement stateful transformations using State and Timer APIs

Show steps

Implement stateful transformations using State and Timer APIs to enhance your pipelines with capabilities like aggregating, filtering, and joining data.

Browse courses on State

Show steps

Create a pipeline that uses a State API to maintain state between elements
Implement a Timer API to schedule events and perform actions at specific times or intervals
Explore advanced features of the State and Timer APIs

Optimize your pipelines for performance

Show steps

Test, profile, and optimize your pipelines to improve their efficiency and performance in production environments.

Browse courses on Best Practices

Show steps

Identify performance bottlenecks in your pipeline
Implement best practices for data processing and resource management
Monitor and fine-tune your pipeline to ensure optimal performance

Participate in Beam Katas

Show steps

Challenge yourself and improve your Beam skills by participating in Beam Katas, a series of coding exercises designed to test your understanding of Beam concepts.

Browse courses on Apache Beam

Show steps

Visit the Beam Katas website and register for an account
Select a Kata and attempt to solve it
Review your solution and learn from the feedback provided

Career center

Learners who complete Serverless Data Processing with Dataflow: Develop Pipelines will develop knowledge and skills that may be useful to these careers:

Data Engineer

A Data Engineer builds and maintains data pipelines that process structured and unstructured data. As a Data Engineer, you would use a course like this one to gain a deeper understanding of Apache Beam concepts and how to apply them to write your own data processing pipelines. You would also learn about best practices for maximizing the performance of your pipelines.

See salaries and explore the career path for Data Engineer

Data Analyst

A Data Analyst cleans, analyzes, and interprets data to identify trends and patterns. As a Data Analyst, you might use a course like this one to gain a deeper understanding of how to process streaming data using windows, watermarks, and triggers. You would also learn about options for sources and sinks in your pipelines, schemas to express your structured data, and how to do stateful transformations using State and Timer APIs.

See salaries and explore the career path for Data Analyst

Data Scientist

A Data Scientist builds and applies mathematical and statistical models to data to extract insights and make predictions. As a Data Scientist, you might use a course like this one to gain a deeper understanding of how to process streaming data using windows, watermarks, and triggers. You would also learn about options for sources and sinks in your pipelines, schemas to express your structured data, and how to do stateful transformations using State and Timer APIs.

See salaries and explore the career path for Data Scientist

Software Engineer

A Software Engineer designs, develops, and maintains software applications. As a Software Engineer, you might use a course like this one to gain a deeper understanding of Apache Beam concepts and how to apply them to write your own data processing pipelines. You would also learn about best practices for maximizing the performance of your pipelines.

See salaries and explore the career path for Software Engineer

DevOps Engineer

A DevOps Engineer automates and manages the software development and deployment process. As a DevOps Engineer, you might use a course like this one to gain a deeper understanding of how to build and maintain data pipelines. You would also learn about best practices for maximizing the performance of your pipelines.

See salaries and explore the career path for DevOps Engineer

Cloud Architect

A Cloud Architect designs and manages cloud computing solutions. As a Cloud Architect, you might use a course like this one to gain a deeper understanding of how to build and maintain data pipelines in the cloud. You would also learn about best practices for maximizing the performance of your pipelines.

See salaries and explore the career path for Cloud Architect

Data Integration Engineer

A Data Integration Engineer designs and builds data pipelines that integrate data from multiple sources. As a Data Integration Engineer, you would use a course like this one to gain a deeper understanding of Apache Beam concepts and how to apply them to write your own data processing pipelines. You would also learn about best practices for maximizing the performance of your pipelines.

See salaries and explore the career path for Data Integration Engineer

Big Data Engineer

A Big Data Engineer designs and builds data pipelines that process large volumes of data. As a Big Data Engineer, you would use a course like this one to gain a deeper understanding of Apache Beam concepts and how to apply them to write your own data processing pipelines. You would also learn about best practices for maximizing the performance of your pipelines.

See salaries and explore the career path for Big Data Engineer

Machine Learning Engineer

A Machine Learning Engineer designs, develops, and deploys machine learning models. As a Machine Learning Engineer, you might use a course like this one to gain a deeper understanding of how to process streaming data using windows, watermarks, and triggers. You would also learn about options for sources and sinks in your pipelines, schemas to express your structured data, and how to do stateful transformations using State and Timer APIs.

See salaries and explore the career path for Machine Learning Engineer

Business Intelligence Analyst

A Business Intelligence Analyst analyzes data to identify trends and patterns that can help businesses make better decisions. As a Business Intelligence Analyst, you might use a course like this one to gain a deeper understanding of how to process streaming data using windows, watermarks, and triggers. You would also learn about options for sources and sinks in your pipelines, schemas to express your structured data, and how to do stateful transformations using State and Timer APIs.

See salaries and explore the career path for Business Intelligence Analyst

Data Visualization Engineer

A Data Visualization Engineer designs and develops data visualizations that help people understand data. As a Data Visualization Engineer, you might use a course like this one to gain a deeper understanding of how to process streaming data using windows, watermarks, and triggers. You would also learn about options for sources and sinks in your pipelines, schemas to express your structured data, and how to do stateful transformations using State and Timer APIs.

See salaries and explore the career path for Data Visualization Engineer

Database Administrator

A Database Administrator manages and maintains databases. As a Database Administrator, you might use a course like this one to gain a deeper understanding of how to build and maintain data pipelines. You would also learn about best practices for maximizing the performance of your pipelines.

See salaries and explore the career path for Database Administrator

IT Manager

An IT Manager plans, organizes, and directs the implementation and maintenance of computer systems and networks. As an IT Manager, you might use a course like this one to gain a deeper understanding of how to build and maintain data pipelines. You would also learn about best practices for maximizing the performance of your pipelines.

See salaries and explore the career path for IT Manager

Systems Administrator

A Systems Administrator manages and maintains computer systems and networks. As a Systems Administrator, you might use a course like this one to gain a deeper understanding of how to build and maintain data pipelines. You would also learn about best practices for maximizing the performance of your pipelines.

See salaries and explore the career path for Systems Administrator

Network Administrator

A Network Administrator manages and maintains computer networks. As a Network Administrator, you might use a course like this one to gain a deeper understanding of how to build and maintain data pipelines. You would also learn about best practices for maximizing the performance of your pipelines.

See salaries and explore the career path for Network Administrator

Reading list

We've selected nine books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Serverless Data Processing with Dataflow: Develop Pipelines.

Cover image

Cover image

Stream Processing with Apache Flink

Save

Provides a comprehensive guide to Apache Flink, a popular stream processing framework that can be used with Apache Beam.

Stream Processing with Apache Flink: Fundamentals,...

Stream Processing with Apache Flink: Fundamentals,...

Cover image

Cover image

Save

Provides a comprehensive overview of data-intensive applications, covering topics such as data modeling, data storage, and data processing.

Cover image

Cover image

International Conference on Applied Technologies

Save

Provides a comprehensive guide to machine learning with big data, including coverage of Apache Beam.

International Conference on Applied Technologies:...

International Conference on Applied Technologies:...

International Conference on Applied Technologies:...

Cover image

Cover image

Hadoop: The Definitive Guide

Save

Provides a comprehensive guide to Hadoop, a popular big data processing framework that can be used with Apache Beam.

Hadoop: The Definitive Guide: Storage and Analysis...

Hadoop: The Definitive Guide

Hadoop: The Definitive Guide

Hadoop: The Definitive Guide

Cover image

Cover image

Real-World Software Development

Save

You will use Lambdas and Streams in Java code to create a pipeline. provides a good introduction to Java 8 Lambdas.

Real-World Software Development

Real-World Software Development

Cover image

Cover image

Save

While not directly related to Dataflow, it provides valuable insights into stream processing, a key aspect covered in this course.

Learning Spark: Lightning-Fast Data Analytics

Learning Spark: Lightning-Fast Data Analytics

Cover image

Cover image

Python for Data Analysis

Save

Provides a foundational understanding of Python libraries and tools, including Apache Beam, for data science.

Python for Data Analysis

Python for Data Analysis

Cover image

Cover image

Big Data, Big Analytics

Save

Although it focuses on Spark and Hadoop, this book provides insights into the broader big data processing landscape, including Apache Beam.

Big Data, Big Analytics: Emerging Business...

Big Data, Big Analytics: Emerging Business...

Cover image

Cover image

Designing Data-Intensive Applications

Save

Provides foundational knowledge in designing and architecting data-intensive applications, complementing the course's focus on data processing techniques.

Designing Data-Intensive Applications: The Big...

Designing Data-Intensive Applications: The Big...

Share

Help others find this course page by sharing it with your friends and followers:

Copy Link

Similar courses

Similar courses are unavailable at this time. Please try again later.

Effort

3 weeks of study, 6-8 hours/week

Level

Advanced

Via

Coursera

Institution

Google Cloud

Instructor

Google Cloud Training

Language

English

This course belongs to a collection called Serverless Data Processing with Dataflow. It's comprised of three courses:

Serverless Data Processing with Dataflow: Operations
Serverless Data Processing with Dataflow: Develop Pipelines (viewing)
Serverless Data Processing with Dataflow: Foundations

Traffic lights

Read about what's good

what should give you pause

and possible dealbreakers

Suitable for learners with knowledge of Apache Beam

Designed for intermediate learners who want to advance their Dataflow skills

Should be taken after the introductory Dataflow course

Could be part of a comprehensive curriculum on Apache Beam

Share this

Share to help others discover this course.

Link

Begin learning today

Enroll now to gain full access to Serverless Data Processing with Dataflow.

Enroll now Enroll in this course

Save for later

Add this course to your list. Find it anytime.

Save

Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser