We may earn an affiliate commission when you visit our partners.

Building Batch Data Pipelines on Google Cloud

Data pipelines typically fall under one of the Extract and Load (EL), Extract, Load and Transform (ELT) or Extract, Transform and Load (ETL) paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners get hands-on experience building data pipeline components on Google Cloud using Qwiklabs.

Enroll now

Or subscribe to Coursera Plus

And get unlimited access to Coursera

Here's a deal for you

We found an offer that may be relevant to this course.

Save money when you learn. All coupon codes, vouchers, and discounts are applied automatically unless otherwise noted.

Valid until April 15

Coursera Plus Sale

Get unlimited access to expert-led courses that give you job-ready certificates with instructors from Google, IBM, and more.

Take

25%

off

What's inside

Syllabus

Introduction

In this module, we introduce the course and agenda

Introduction to Building Batch Data Pipelines

This module reviews different methods of data loading: EL, ELT and ETL and when to use what

Executing Spark on Dataproc

This module shows how to run Hadoop on Dataproc, how to leverage Cloud Storage, and how to optimize your Dataproc jobs.

Serverless Data Processing with Dataflow

This module covers using Dataflow to build your data processing pipelines

Manage Data Pipelines with Cloud Data Fusion and Cloud Composer

This module shows how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

Course Summary

Good to know

Know what's good

, what to watch for

, and possible dealbreakers

Examines methods of data loading, including EL, ELT and ETL, and when to use each one

Teaches how to leverage Cloud Storage and optimize Dataproc jobs

Develops serverless data processing skills with Dataflow

Explores data pipeline management with Cloud Data Fusion and Cloud Composer

Provides hands-on experience building data pipeline components using Qwiklabs

Save this course

Save Building Batch Data Pipelines on Google Cloud to your list so you can find it easily later:

Save

Reviews summary

Batch pipeline development on google cloud

Learners say this certification-oriented course is largely positive, giving it high marks for concept explanations and practical hands-on experience. They highlight engaging assignments and the relevance of these lessons to a data engineer role in the industry. However, learners also express concerns about some difficult exams, lack of beginner-friendliness, and occasional technical issues with labs.

Learners felt that the cost of the course was reasonable given the value and skills they gained.

"The cost of the course was worth it."

"I think the course was a good value for the money."

While some learners appreciate the hands-on approach, others felt the labs should have more in-depth tasks to strengthen the learning experience.

"The hands-on labs were a great way to learn the material."

"I would have liked to see more hands-on labs."

"The labs were too easy and didn't provide enough of a challenge."

Learners see the course as well-suited for exam preparation, providing valuable content and insights.

"This course is a great way to prepare for the certification exam."

"The practice exams were particularly helpful."

The course offered a comprehensive overview of Google Cloud's features and services for managing data engineering pipelines.

"It was great to apply the code assignments to the Google Cloud tools."

"The labs have been helpful in providing hands-on experience with the Google Cloud Platform."

"The exercises and projects are very good for practicing the use of GCP services."

The course provides a high level of code learning to get you up and running with the tools from Google Cloud, which is useful for practical applications on the job.

"I am enjoying the hands-on experience, especially with the Python coding assignments."

"I was able to develop my coding skills and apply them immediately to my portfolio, which was great for getting a job afterwards."

Students find the assignments in this course to be engaging, featuring plenty of hands-on experience and chances to test your understanding through code and exercises.

"Amazing course, very engaging assignments."

"I like the engagement, continuous feedback and tests."

"The hands-on assignments are graded and are relevant to data engineering tasks"

The assumption of prior knowledge and fast pacing left some learners feeling behind, especially those new to data engineering.

"The course is not beginner-friendly."

"The pace of the course was too fast for me. I'm not a programmer, and some of the topics went over my head."

Some learners experienced technical difficulties, especially with labs, impacting their learning experience.

"The labs don't work most of the time."

"I found the course a bit disappointing. There were several technical issues I encountered that made it really difficult to engage with the material properly."

"I wish the labs ran smoothly. I spent most of my time reaching out for help support instead of actually learning."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Building Batch Data Pipelines on Google Cloud with these activities:

Review the fundamentals of data pipelining

Show steps

Ensure that you have a strong understanding of the basic concepts and terminologies used in data pipelining.

Show steps

Read through the course materials on data pipelining.
Review articles and blog posts on the different data pipelining paradigms (EL, ELT, ETL).
Complete practice exercises or quizzes on data pipelining concepts.

Explore the official Qwiklabs for Google Cloud Platform

Show steps

Review some of the key tools and technologies that you will be using in this course by following their official tutorials from Qwiklabs.

Browse courses on Cloud Storage

Show steps

Visit the Qwiklabs website and browsing through the available courses on Cloud Platform.
Select a course that covers a tool or technology that you'll be using in this course, such as Cloud Storage or Dataproc.
Complete the course and follow the instructions provided in the lab.

Compile a list of resources on data pipeline best practices

Show steps

Expand your knowledge by gathering and organizing resources on best practices for designing and implementing data pipelines.

Browse courses on Data Pipeline

Show steps

Conduct an online search for articles, blog posts, and whitepapers on data pipeline best practices.
Create a document or spreadsheet to organize your findings.
Categorize the resources based on topics, such as data quality, performance optimization, or security.

Five other activities

Expand to see all activities and additional details

Show all eight activities

Connect with data pipeline professionals

Show steps

Expand your network and learn from others in the field by attending meetups or virtual events.

Browse courses on Data Pipeline

Show steps

Find local or online meetups or events focused on data pipeline technologies.
Attend the event and introduce yourself to other attendees.
Engage in conversations and ask questions about data pipeline practices and experiences.

Attend a workshop on data pipeline technologies

Show steps

Engage with experts and practitioners in the field by attending workshops on specific data pipeline technologies, such as BigQuery or Dataflow.

Browse courses on Data Pipeline

Show steps

Search for upcoming workshops or conferences on data pipeline technologies.
Register for a workshop that aligns with your interests and learning goals.
Attend the workshop and actively participate in discussions and hands-on exercises.

Create a data pipeline diagram

Show steps

Develop a visual representation of a data pipeline to enhance your understanding of the data flow and its components.

Browse courses on Data Pipeline

Show steps

Choose a data pipeline scenario or use case.
Identify the different stages and components of the data pipeline.
Create a diagram using a tool like draw.io or Visio to illustrate the data flow and interactions between components.

Solve data pipeline design problems

Show steps

Test your understanding of data pipeline design by solving real-world problems and scenarios.

Browse courses on Data Pipeline

Show steps

Find online resources or platforms that provide data pipeline design challenges.
Select a challenge and analyze the requirements.
Design a data pipeline solution and document your approach.
Submit your solution for review and feedback.

Build a sample data pipeline using Google Cloud technologies

Show steps

Gain hands-on experience by building a small-scale data pipeline that incorporates concepts covered in the course.

Browse courses on Data Pipeline

Show steps

Choose a simple data pipeline scenario or use case.
Select the appropriate Google Cloud technologies for your pipeline, such as BigQuery or Dataflow.
Develop and implement your data pipeline.
Test and validate your pipeline.

Career center

Learners who complete Building Batch Data Pipelines on Google Cloud will develop knowledge and skills that may be useful to these careers:

Data Engineer

A Data Engineer designs, builds, and maintains data pipelines for an organization. They collaborate with Data Scientists and other stakeholders to gather requirements, design and build data pipelines, and ensure that the data is accurate, complete, and timely. This course is a valuable resource for anyone looking to become a Data Engineer as it provides a comprehensive overview of the different methods of data loading, how to execute Spark on Dataproc, how to use Dataflow for serverless data processing, and how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

See salaries and explore the career path for Data Engineer

Data Architect

A Data Architect designs and manages an organization's data architecture. They work with stakeholders to understand their data needs, design and build data pipelines, and ensure that the data is accurate, complete, and timely. This course is a valuable resource for anyone looking to become a Data Architect as it provides a comprehensive overview of the different methods of data loading, how to execute Spark on Dataproc, how to use Dataflow for serverless data processing, and how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

See salaries and explore the career path for Data Architect

Data Scientist

A Data Scientist uses data to solve business problems. They collaborate with stakeholders to understand their business needs, gather data, analyze data, and build models to make predictions. This course is a valuable resource for anyone looking to become a Data Scientist as it provides a comprehensive overview of the different methods of data loading, how to execute Spark on Dataproc, how to use Dataflow for serverless data processing, and how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

See salaries and explore the career path for Data Scientist

Business Analyst

A Business Analyst gathers and analyzes data to help businesses make informed decisions. They work with stakeholders to understand their business needs, gather data, analyze data, and make recommendations. This course is a valuable resource for anyone looking to become a Business Analyst as it provides a comprehensive overview of the different methods of data loading, how to execute Spark on Dataproc, how to use Dataflow for serverless data processing, and how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

See salaries and explore the career path for Business Analyst

Cloud Architect

A Cloud Architect designs and manages an organization's cloud infrastructure. They work with stakeholders to understand their business needs, design and build cloud infrastructure, and ensure that the infrastructure is scalable, reliable, and secure. This course is a valuable resource for anyone looking to become a Cloud Architect as it provides a comprehensive overview of the different methods of data loading, how to execute Spark on Dataproc, how to use Dataflow for serverless data processing, and how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

See salaries and explore the career path for Cloud Architect

Database Administrator

A Database Administrator manages an organization's databases. They work with stakeholders to understand their data needs, design and build databases, and ensure that the databases are performant, reliable, and secure. This course is a valuable resource for anyone looking to become a Database Administrator as it provides a comprehensive overview of the different methods of data loading, how to execute Spark on Dataproc, how to use Dataflow for serverless data processing, and how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

See salaries and explore the career path for Database Administrator

Software Engineer

A Software Engineer designs, builds, and maintains software systems. They work with stakeholders to understand their business needs, design and build software systems, and ensure that the software systems are scalable, reliable, and secure. This course may be useful for anyone looking to become a Software Engineer as it provides a comprehensive overview of the different methods of data loading, how to execute Spark on Dataproc, how to use Dataflow for serverless data processing, and how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

See salaries and explore the career path for Software Engineer

Systems Administrator

A Systems Administrator manages an organization's computer systems. They work with stakeholders to understand their business needs, design and build computer systems, and ensure that the computer systems are performant, reliable, and secure. This course may be useful for anyone looking to become a Systems Administrator as it provides a comprehensive overview of the different methods of data loading, how to execute Spark on Dataproc, how to use Dataflow for serverless data processing, and how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

See salaries and explore the career path for Systems Administrator

Data Analyst

A Data Analyst gathers and analyzes data to help businesses make informed decisions. They work with stakeholders to understand their business needs, gather data, analyze data, and make recommendations. This course may be useful for anyone looking to become a Data Analyst as it provides a comprehensive overview of the different methods of data loading, how to execute Spark on Dataproc, how to use Dataflow for serverless data processing, and how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

See salaries and explore the career path for Data Analyst

Information Security Analyst

An Information Security Analyst protects an organization's information systems from unauthorized access, use, disclosure, disruption, modification, or destruction. They work with stakeholders to understand their security needs, design and implement security controls, and monitor and respond to security incidents. This course may be useful for anyone looking to become an Information Security Analyst as it provides a comprehensive overview of the different methods of data loading, how to execute Spark on Dataproc, how to use Dataflow for serverless data processing, and how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

See salaries and explore the career path for Information Security Analyst

Statistician

A Statistician uses statistical methods to collect, analyze, interpret, and present data. They work with stakeholders to understand their research needs, design and conduct statistical studies, and analyze and interpret data. This course may be useful for anyone looking to become a Statistician as it provides a comprehensive overview of the different methods of data loading, how to execute Spark on Dataproc, how to use Dataflow for serverless data processing, and how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

See salaries and explore the career path for Statistician

Operations Research Analyst

An Operations Research Analyst uses mathematical and analytical methods to solve business problems. They work with stakeholders to understand their business needs, develop and implement mathematical models, and analyze and interpret results. This course may be useful for anyone looking to become an Operations Research Analyst as it provides a comprehensive overview of the different methods of data loading, how to execute Spark on Dataproc, how to use Dataflow for serverless data processing, and how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

See salaries and explore the career path for Operations Research Analyst

Financial Analyst

A Financial Analyst uses financial data to evaluate and make investment decisions. They work with stakeholders to understand their investment goals, analyze financial data, and make recommendations. This course may be useful for anyone looking to become a Financial Analyst as it provides a comprehensive overview of the different methods of data loading, how to execute Spark on Dataproc, how to use Dataflow for serverless data processing, and how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

See salaries and explore the career path for Financial Analyst

Market Research Analyst

A Market Research Analyst gathers and analyzes data to understand consumer behavior. They work with stakeholders to understand their marketing needs, design and conduct market research studies, and analyze and interpret data. This course may be useful for anyone looking to become a Market Research Analyst as it provides a comprehensive overview of the different methods of data loading, how to execute Spark on Dataproc, how to use Dataflow for serverless data processing, and how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

See salaries and explore the career path for Market Research Analyst

Product Manager

A Product Manager manages the development and launch of a product. They work with stakeholders to understand their product needs, design and develop the product, and launch and market the product. This course may be useful for anyone looking to become a Product Manager as it provides a comprehensive overview of the different methods of data loading, how to execute Spark on Dataproc, how to use Dataflow for serverless data processing, and how to manage data pipelines with Cloud Data Fusion and Cloud Composer.

See salaries and explore the career path for Product Manager

Reading list

We've selected nine books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Building Batch Data Pipelines on Google Cloud.

97 Things Every Data Engineer Should Know

Save

Is specifically tailored to building data pipelines on Google Cloud Platform. It provides detailed guidance on using Cloud Dataflow, BigQuery, and other Google Cloud services for data transformation and management.

97 Things Every Data Engineer Should Know:...

Kindle Edition

Agile Data Science 2.0

Save

Provides a practical guide to cloud data engineering, covering topics such as data architecture, data governance, and data processing. It offers insights into the challenges and opportunities of building data pipelines in the cloud.

Agile Data Science 2.0: Building Full-Stack Data...

Paperback

Agile Data Science 2.0: Building Full-Stack Data...

Kindle Edition

Spark: The Definitive Guide

Save

This definitive guide to Apache Spark, providing a comprehensive overview of the Spark ecosystem. It covers topics such as data storage, data processing, and data analysis, and offers insights into the challenges and opportunities of working with Spark.

Spark: The Definitive Guide: Big Data Processing...

Paperback

Spark: The Definitive Guide: Big Data Processing...

Kindle Edition

Data Pipelines with Apache Airflow

Save

Focuses on using Apache Airflow for data pipeline orchestration. It provides detailed guidance on designing, building, and maintaining data pipelines with Airflow, and offers practical examples of real-world use cases.

Data Pipelines with Apache Airflow

Paperback

Deep Learning with Python, Second Edition

Save

Provides a comprehensive introduction to deep learning with Python, covering topics such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and generative adversarial networks (GANs). It offers practical guidance on building and training deep learning models using popular Python libraries such as TensorFlow and Keras.

Deep Learning with Python, Second Edition

Paperback

Deep Learning with Python, Second Edition

Kindle Edition

Stream Processing with Apache Flink

Save

Provides a comprehensive overview of stream processing with Apache Flink. It covers topics such as stream ingestion, transformation, and analysis, and offers practical examples of building stream processing applications with Flink.

Stream Processing with Apache Flink: Fundamentals,...

Paperback

Stream Processing with Apache Flink: Fundamentals,...

Kindle Edition

Hadoop: The Definitive Guide

Save

This classic book on Hadoop, providing a comprehensive overview of the Hadoop ecosystem. It covers topics such as data storage, data processing, and data analysis, and offers insights into the challenges and opportunities of working with Hadoop.

Hadoop: The Definitive Guide: Storage and Analysis...

Paperback

Hadoop: The Definitive Guide

Kindle Edition

Hadoop: The Definitive Guide

Paperback

Hadoop: The Definitive Guide

Paperback

Data Science for Business

Save

Provides a practical introduction to data science for business professionals. It covers topics such as data analysis, data mining, and machine learning, and offers insights into the challenges and opportunities of data-driven decision-making.

Data Science for Business: What You Need to Know...

Paperback

Data Science for Business: What You Need to Know...

Kindle Edition

Architecting the Cloud

Save

Provides a broad overview of big data management and analytics. It covers topics such as data warehousing, data mining, and machine learning, and offers insights into the challenges and opportunities of working with big data.

Architecting the Cloud: Design Decisions for Cloud...

Hardcover

Architecting the Cloud: Design Decisions for Cloud...

Kindle Edition

Help others find this course page by sharing it with your friends and followers:

Facebook

Copy Link

Similar courses

Similar courses are unavailable at this time. Please try again later.

Level

Intermediate

Via

Coursera

Institution

Google Cloud

Instructor

Google Cloud Training

Language

English

Building Batch Data Pipelines on Google Cloud belongs to the following collections:

Good to know

Examines methods of data loading, including EL, ELT and ETL, and when to use each one

Teaches how to leverage Cloud Storage and optimize Dataproc jobs

Develops serverless data processing skills with Dataflow

Explores data pipeline management with Cloud Data Fusion and Cloud Composer

Provides hands-on experience building data pipeline components using Qwiklabs

Share and help others discover this course.

Facebook

Link

Don't miss out

Enroll today to gain access to this course

Enroll in this course

Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.