Serverless Data Processing with Dataflow: Pipelines

What's inside

Syllabus

はじめに

このモジュールでは、コースとその概要を紹介します

Beam のコンセプトの復習

Apache Beam の主なコンセプトと、それを独自のデータ処理パイプラインを作成するために適用する方法を復習します。

ウィンドウ、ウォーターマーク、トリガー

このモジュールでは、Dataflow を使用してストリーミングでデータを処理する方法を学びます。そのためには、3 つの主要なコンセプトを知っておく必要があります。1 つ目はウィンドウでデータをグループ化する方法、2 つ目はウィンドウに結果を表示する準備ができたことを知らせるウォーターマークの重要性、3 つ目はウィンドウ出力のタイミングと回数を制御する方法です。

ソースとシンク

このモジュールでは、Google Cloud Dataflow でソースとシンクの役割を果たすシステムについて学びます。Text IO、File IO、BigQuery IO、PubSub IO、KafKa IO、BigTable IO、Avro IO、Splittable DoFn の例を紹介していきます。また、各 IO に関連する便利な機能についても説明します。

スキーマ

このモジュールでは、Beam パイプラインで構造化データを表現する方法を開発者に提供するスキーマを紹介します。

State と Timer

このモジュールでは、State と Timer について説明します。どちらも、ステートフル変換を実装するために DoFn で使用できる優れた機能です。

ベストプラクティス

このモジュールでは、ベストプラクティスについて説明し、Dataflow パイプラインのパフォーマンスを最大化する一般的なパターンについて復習します。

Dataflow SQL と DataFrame

このモジュールでは、Beam でビジネスロジックを表現するための 2 つの新しい API、SQL と DataFrame を紹介します。

Beam ノートブック

このモジュールでは、Beam ノートブックについて説明します。これは、Python 開発者が Beam SDK にオンボードし、Jupyter ノートブック環境でパイプラインの反復的な開発を行うためのインターフェースです。

概要

このモジュールでは、本コースで取り上げた内容を振り返ります

Good to know

Know what's good

, what to watch for

, and possible dealbreakers

Covers technologies highly relevant to the industry, such as Apache Beam

Taught by Google Cloud Training, who are recognized for their work in cloud computing training

Examines windowing, watermarks, and triggers, which are core concepts for streaming data processing

Develops skills in using both SQL and DataFrames to express business logic in Beam pipelines

Introduces Beam Notebooks, which simplifies pipeline development for Python developers within Jupyter Notebooks

Advises course participants to complete the first course in the Dataflow series before starting this one

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Serverless Data Processing with Dataflow: Pipelines - 日本語版 with these activities:

Organize and Review Course Materials

Show steps

Stay organized and enhance your learning by compiling and reviewing course materials regularly

Show steps

Create a system for organizing notes, assignments, and other materials
Regularly review and summarize key concepts covered in the course
Annotate materials to highlight important information and connections

Review Python for Apache Beam

Show steps

Ensure a solid footing in Python programming before the course begins to make the most of the content

Browse courses on Apache Beam

Show steps

Read through documentation on Python basics
Complete coding exercises to practice syntax and core concepts
Review online resources for best practices in Python programming
Build a small Python project to apply your understanding

Learn about Streaming Data Processing with Apache Beam

Show steps

Explore guided tutorials to familiarize yourself with the concepts of streaming data processing and Apache Beam

Browse courses on Streaming Data Processing

Show steps

Follow online tutorials on streaming data processing with Apache Beam
Work through examples provided in the official documentation
Connect with online communities and forums for discussions and support

Two other activities

Expand to see all activities and additional details

Show all five activities

Build a Simple Data Transformation Pipeline

Show steps

Apply your understanding by building a simple data transformation pipeline using Apache Beam

Show steps

Design a data transformation pipeline that meets a specific need
Implement the pipeline using Apache Beam SDK
Test and validate the pipeline's functionality
Deploy the pipeline and monitor its performance

Develop a Data Processing Dashboard

Show steps

Demonstrate your skills by creating a data processing dashboard using Apache Beam

Show steps

Gather requirements and design the dashboard
Develop the dashboard using Apache Beam and relevant tools
Integrate the dashboard with data sources and pipelines
Test and validate the dashboard's functionality
Present the dashboard to stakeholders

Career center

Learners who complete Serverless Data Processing with Dataflow: Pipelines - 日本語版 will develop knowledge and skills that may be useful to these careers:

Data Engineer

A Data Engineer designs, builds, and maintains the infrastructure and tools used to store, process, and analyze data. Using this course, you can build serverless data processing pipelines that will help you make a smooth transition into this role. The course also covers performance optimization using best practices to maximize your work as a Data Engineer.

See salaries and explore the career path for Data Engineer

Data Analyst

A Data Analyst collects, analyzes, and interprets data to identify trends and patterns. Once you are a Data Analyst, you can use the learning from this course, such as data processing pipeline development using Beam SDK, to collect streaming data and use SQL to express your business logic.

See salaries and explore the career path for Data Analyst

Software Engineer

A Software Engineer designs, develops, and maintains software applications. As a Software Engineer, you would be responsible for writing code. The coding concepts you will learn from this course will help you build a great foundation.

See salaries and explore the career path for Software Engineer

Data Scientist

A Data Scientist is responsible for developing and implementing machine learning and statistical models to solve business problems. This course covers using Dataflow SQL to express business logic, which will be helpful for you as a Data Scientist.

See salaries and explore the career path for Data Scientist

Business Analyst

A Business Analyst identifies and solves business problems through data analysis. In this role, you would need to understand how to develop data processing pipelines, which is covered in this course.

See salaries and explore the career path for Business Analyst

Machine Learning Engineer

A Machine Learning Engineer develops and deploys machine learning models to solve business problems. In this role, you would develop data processing pipelines to prepare data for machine learning models.

See salaries and explore the career path for Machine Learning Engineer

Cloud Architect

A Cloud Architect designs and manages cloud computing solutions. In this role, you would need to understand how to build serverless data processing pipelines, which is covered in this course.

See salaries and explore the career path for Cloud Architect

Data Architect

A Data Architect designs and manages data architectures. This course may be helpful in understanding how to design data processing pipelines.

See salaries and explore the career path for Data Architect

Database Administrator

A Database Administrator manages and maintains databases. This course may be helpful in understanding how to build data processing pipelines that interact with databases.

See salaries and explore the career path for Database Administrator

Systems Engineer

A Systems Engineer designs, implements, and maintains computer systems. This course may be helpful in understanding how to build data processing pipelines.

See salaries and explore the career path for Systems Engineer

Network Engineer

A Network Engineer designs, implements, and maintains computer networks. This course may be helpful in understanding how to build data processing pipelines that use cloud networking services.

See salaries and explore the career path for Network Engineer

Security Engineer

A Security Engineer designs, implements, and maintains security systems. This course may be helpful in understanding how to build data processing pipelines that are secure.

See salaries and explore the career path for Security Engineer

Cloud Security Engineer

A Cloud Security Engineer designs, implements, and maintains security systems for cloud computing environments. This course may be helpful in understanding how to build data processing pipelines that are secure in the cloud.

See salaries and explore the career path for Cloud Security Engineer

DevOps Engineer

A DevOps Engineer works with developers to build and maintain software systems. This course may be helpful in understanding how to build data processing pipelines that are integrated with continuous integration and continuous delivery (CI/CD) systems.

See salaries and explore the career path for DevOps Engineer

Quality Assurance Engineer

A Quality Assurance Engineer tests and verifies the quality of software systems. This course may be helpful in understanding how to build data processing pipelines that are reliable and error-free.

See salaries and explore the career path for Quality Assurance Engineer

Serverless Data Processing with Dataflow

Pipelines - 日本語版

What's inside

Syllabus

Good to know

Save this course

Activities

Career center

Reading list

Share

Similar courses