We may earn an affiliate commission when you visit our partners.
Course image
Google Cloud Training

Dataflow コースシリーズの 2 回目である今回は、Beam SDK を使用したパイプラインの開発について詳しく説明します。まず、Apache Beam のコンセプトについて復習します。次に、ウィンドウ、ウォーターマーク、トリガーを使用したストリーミング データの処理について説明します。さらに、パイプラインのソースとシンクのオプション、構造化データを表現するためのスキーマ、State API と Timer API を使用してステートフル変換を行う方法について説明します。続いて、パイプラインのパフォーマンスを最大化するためのベスト プラクティスを再確認します。コースの終盤では、Beam でビジネス ロジックを表現するための SQL と DataFrame、および Beam ノートブックを使用してパイプラインを反復的に開発する方法を説明します。

Enroll now

What's inside

Syllabus

はじめに
このモジュールでは、コースとその概要を紹介します
Beam のコンセプトの復習
Apache Beam の主なコンセプトと、それを独自のデータ処理パイプラインを作成するために適用する方法を復習します。
Read more
ウィンドウ、ウォーターマーク、トリガー
このモジュールでは、Dataflow を使用してストリーミングでデータを処理する方法を学びます。そのためには、3 つの主要なコンセプトを知っておく必要があります。1 つ目はウィンドウでデータをグループ化する方法、2 つ目はウィンドウに結果を表示する準備ができたことを知らせるウォーターマークの重要性、3 つ目はウィンドウ出力のタイミングと回数を制御する方法です。
ソースとシンク
このモジュールでは、Google Cloud Dataflow でソースとシンクの役割を果たすシステムについて学びます。Text IO、File IO、BigQuery IO、PubSub IO、KafKa IO、BigTable IO、Avro IO、Splittable DoFn の例を紹介していきます。また、各 IO に関連する便利な機能についても説明します。
スキーマ
このモジュールでは、Beam パイプラインで構造化データを表現する方法を開発者に提供するスキーマを紹介します。
State と Timer
このモジュールでは、State と Timer について説明します。どちらも、ステートフル変換を実装するために DoFn で使用できる優れた機能です。
ベスト プラクティス
このモジュールでは、ベスト プラクティスについて説明し、Dataflow パイプラインのパフォーマンスを最大化する一般的なパターンについて復習します。
Dataflow SQL と DataFrame
このモジュールでは、Beam でビジネス ロジックを表現するための 2 つの新しい API、SQL と DataFrame を紹介します。
Beam ノートブック
このモジュールでは、Beam ノートブックについて説明します。これは、Python 開発者が Beam SDK にオンボードし、Jupyter ノートブック環境でパイプラインの反復的な開発を行うためのインターフェースです。
概要
このモジュールでは、本コースで取り上げた内容を振り返ります

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Covers technologies highly relevant to the industry, such as Apache Beam
Taught by Google Cloud Training, who are recognized for their work in cloud computing training
Examines windowing, watermarks, and triggers, which are core concepts for streaming data processing
Develops skills in using both SQL and DataFrames to express business logic in Beam pipelines
Introduces Beam Notebooks, which simplifies pipeline development for Python developers within Jupyter Notebooks
Advises course participants to complete the first course in the Dataflow series before starting this one

Save this course

Save Serverless Data Processing with Dataflow: Pipelines - 日本語版 to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Serverless Data Processing with Dataflow: Pipelines - 日本語版 with these activities:
Organize and Review Course Materials
Stay organized and enhance your learning by compiling and reviewing course materials regularly
Show steps
  • Create a system for organizing notes, assignments, and other materials
  • Regularly review and summarize key concepts covered in the course
  • Annotate materials to highlight important information and connections
Review Python for Apache Beam
Ensure a solid footing in Python programming before the course begins to make the most of the content
Browse courses on Apache Beam
Show steps
  • Read through documentation on Python basics
  • Complete coding exercises to practice syntax and core concepts
  • Review online resources for best practices in Python programming
  • Build a small Python project to apply your understanding
Learn about Streaming Data Processing with Apache Beam
Explore guided tutorials to familiarize yourself with the concepts of streaming data processing and Apache Beam
Browse courses on Streaming Data Processing
Show steps
  • Follow online tutorials on streaming data processing with Apache Beam
  • Work through examples provided in the official documentation
  • Connect with online communities and forums for discussions and support
Two other activities
Expand to see all activities and additional details
Show all five activities
Build a Simple Data Transformation Pipeline
Apply your understanding by building a simple data transformation pipeline using Apache Beam
Show steps
  • Design a data transformation pipeline that meets a specific need
  • Implement the pipeline using Apache Beam SDK
  • Test and validate the pipeline's functionality
  • Deploy the pipeline and monitor its performance
Develop a Data Processing Dashboard
Demonstrate your skills by creating a data processing dashboard using Apache Beam
Show steps
  • Gather requirements and design the dashboard
  • Develop the dashboard using Apache Beam and relevant tools
  • Integrate the dashboard with data sources and pipelines
  • Test and validate the dashboard's functionality
  • Present the dashboard to stakeholders

Career center

Learners who complete Serverless Data Processing with Dataflow: Pipelines - 日本語版 will develop knowledge and skills that may be useful to these careers:
Data Engineer
A Data Engineer designs, builds, and maintains the infrastructure and tools used to store, process, and analyze data. Using this course, you can build serverless data processing pipelines that will help you make a smooth transition into this role. The course also covers performance optimization using best practices to maximize your work as a Data Engineer.
Data Analyst
A Data Analyst collects, analyzes, and interprets data to identify trends and patterns. Once you are a Data Analyst, you can use the learning from this course, such as data processing pipeline development using Beam SDK, to collect streaming data and use SQL to express your business logic.
Software Engineer
A Software Engineer designs, develops, and maintains software applications. As a Software Engineer, you would be responsible for writing code. The coding concepts you will learn from this course will help you build a great foundation.
Data Scientist
A Data Scientist is responsible for developing and implementing machine learning and statistical models to solve business problems. This course covers using Dataflow SQL to express business logic, which will be helpful for you as a Data Scientist.
Business Analyst
A Business Analyst identifies and solves business problems through data analysis. In this role, you would need to understand how to develop data processing pipelines, which is covered in this course.
Machine Learning Engineer
A Machine Learning Engineer develops and deploys machine learning models to solve business problems. In this role, you would develop data processing pipelines to prepare data for machine learning models.
Cloud Architect
A Cloud Architect designs and manages cloud computing solutions. In this role, you would need to understand how to build serverless data processing pipelines, which is covered in this course.
Data Architect
A Data Architect designs and manages data architectures. This course may be helpful in understanding how to design data processing pipelines.
Database Administrator
A Database Administrator manages and maintains databases. This course may be helpful in understanding how to build data processing pipelines that interact with databases.
Systems Engineer
A Systems Engineer designs, implements, and maintains computer systems. This course may be helpful in understanding how to build data processing pipelines.
Network Engineer
A Network Engineer designs, implements, and maintains computer networks. This course may be helpful in understanding how to build data processing pipelines that use cloud networking services.
Security Engineer
A Security Engineer designs, implements, and maintains security systems. This course may be helpful in understanding how to build data processing pipelines that are secure.
Cloud Security Engineer
A Cloud Security Engineer designs, implements, and maintains security systems for cloud computing environments. This course may be helpful in understanding how to build data processing pipelines that are secure in the cloud.
DevOps Engineer
A DevOps Engineer works with developers to build and maintain software systems. This course may be helpful in understanding how to build data processing pipelines that are integrated with continuous integration and continuous delivery (CI/CD) systems.
Quality Assurance Engineer
A Quality Assurance Engineer tests and verifies the quality of software systems. This course may be helpful in understanding how to build data processing pipelines that are reliable and error-free.

Reading list

We've selected seven books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Serverless Data Processing with Dataflow: Pipelines - 日本語版.
Provides a comprehensive overview of data warehousing and how to design and build a data warehouse.
Provides a comprehensive overview of NoSQL databases and how to choose the right NoSQL database for your needs.
Provides a comprehensive overview of the challenges and patterns involved in designing data-intensive applications. It covers topics such as data modeling, data storage, data processing, and data analytics.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Serverless Data Processing with Dataflow: Pipelines - 日本語版.
ML Pipelines on Google Cloud - 日本語版
Most relevant
Building Resilient Streaming Analytics Systems on GCP 日本語版
Most relevant
Serverless Data Processing with Dataflow: Foundations -...
Most relevant
Building Batch Data Pipelines on GCP 日本語版
Most relevant
Smart Analytics, Machine Learning, and AI on GCP 日本語版
Most relevant
Google Meet - 日本語版
Most relevant
Machine Learning in the Enterprise - 日本語版
Most relevant
Google Cloud Big Data and Machine Learning Fundamentals...
Most relevant
Google Docs 日本語版
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser