We may earn an affiliate commission when you visit our partners.
Course image
Course image
Coursera logo

Build an End-to-End Data Capture Pipeline using Document AI

Google Cloud Training

This is a self-paced lab that takes place in the Google Cloud console. In this lab you use Cloud Functions and Pub/Sub to create an end-to-end document processing pipeline using Document AI. The Document AI API is a document understanding solution that takes unstructured data, such as documents and emails, and makes the data easier to understand, analyze, and consume.

Read more

This is a self-paced lab that takes place in the Google Cloud console. In this lab you use Cloud Functions and Pub/Sub to create an end-to-end document processing pipeline using Document AI. The Document AI API is a document understanding solution that takes unstructured data, such as documents and emails, and makes the data easier to understand, analyze, and consume.

In this lab, you will create a document processing pipeline that will automatically process documents that are uploaded to Cloud Storage. The pipeline consists of a primary Cloud Function that processes new files that are uploaded to Cloud Storage using a Document AI form processor and then saves form data detected in those files to BigQuery. If the form data includes any address fields the address data is then written to a Pub/Sub topic that in turn triggers a second Cloud Function that uses to Geocoding API to provide geographic coordinate data for the address that is also written to BigQuery.

This is a simple pipeline that uses a general form processor that will detect basic form data, such as a labelled field containing address information. Document AI processors that use one of the specialized parsers that are beyond the scope of this lab provide enhanced entity information for specific document types even when those documents do not include labelled fields. For example, a Document AI Invoice parser can provide detailed address and supplier information, from an unlabelled invoice document because it understands the layout of invoices.

Enroll now

What's inside

Syllabus

Build an End-to-End Data Capture Pipeline using Document AI

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Designed to help learners build a data capture pipeline, this course is ideal for engineers and data scientists looking to streamline their document processing workflow
This course uses the Google Cloud Platform, which is suitable for learners who want to use this suite of tools in their workflow
Note that this course assumes some prior knowledge of the Google Cloud Platform (e.g., Cloud Functions, Pub/Sub, and BigQuery)
The focus on a practical, hands-on approach makes this course useful for learners who want to apply their knowledge in real-world scenarios

Save this course

Save Build an End-to-End Data Capture Pipeline using Document AI to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Build an End-to-End Data Capture Pipeline using Document AI with these activities:
Review Cloud Storage knowledge
Reviewing Google's Cloud storage will give you a foundation to succeed in this course.
Browse courses on Cloud Storage
Show steps
  • Describe a Cloud Storage bucket
  • Describe a Cloud Storage blob
  • Upload a file to Cloud Storage using the gcloud command-line tool
  • List files in a Cloud Storage bucket using the gcloud command-line tool
Complete a tutorial on Cloud Storage and Cloud Functions
Following a guided tutorial will strengthen your foundational knowledge and comfort with these essential concepts.
Browse courses on Cloud Storage
Show steps
  • Identify a tutorial on Cloud Storage and Cloud Functions
  • Complete the tutorial
  • Reflect on lessons learned
Practice writing Cloud Functions
Writing Cloud Functions will directly apply your understanding of the concepts taught in this course.
Browse courses on Cloud Functions
Show steps
  • Brainstorm a simple Cloud Function to write
  • Write the Cloud Function code
  • Deploy the Cloud Function
  • Test the Cloud Function
Two other activities
Expand to see all activities and additional details
Show all five activities
Create a blog post or article on a topic related to Cloud Functions or Pub/Sub
Creating a blog post or article will help solidify your understanding of the concepts and help others learn.
Browse courses on Cloud Functions
Show steps
  • Identify a topic related to Cloud Functions or Pub/Sub
  • Research and gather information on the topic
  • Write the blog post or article
  • Publish or share the blog post or article
Contribute to an open-source project related to Cloud Functions or Pub/Sub
Contributing to an open-source project is a great way to gain real-world experience and make a difference.
Browse courses on Cloud Functions
Show steps
  • Identify an open-source project related to Cloud Functions or Pub/Sub
  • Review the project's documentation and codebase
  • Identify an area where you can contribute
  • Make a contribution to the project
  • Submit a pull request

Career center

Learners who complete Build an End-to-End Data Capture Pipeline using Document AI will develop knowledge and skills that may be useful to these careers:
Data Engineer
Document AI can be used to extract information from a variety of documents, such as invoices, contracts, and emails. This information can then be used to automate tasks, such as data entry and analysis. Data Engineers are responsible for designing and building data pipelines that can process large amounts of data. This course can help Data Engineers build pipelines that can extract information from documents and use it to improve their data analysis processes.
Data Analyst
Document AI can be used to extract information from a variety of documents, such as invoices, contracts, and emails. This information can then be used to analyze data and make informed decisions. Data Analysts are responsible for analyzing data to identify trends and patterns. This course can help Data Analysts build skills in using Document AI to extract information from documents and use it to improve their data analysis processes.
Software Engineer
Document AI can be used to extract information from a variety of documents, such as invoices, contracts, and emails. This information can then be used to automate tasks, such as data entry and analysis. Software Engineers are responsible for designing and building software applications. This course can help Software Engineers build applications that can extract information from documents and use it to improve their software applications.
Machine Learning Engineer
Document AI is a machine learning model that can be used to extract information from documents. Machine Learning Engineers are responsible for building and deploying machine learning models. This course can help Machine Learning Engineers build skills in using Document AI to extract information from documents and use it to improve their machine learning models.
Data Scientist
Document AI can be used to extract information from a variety of documents, such as invoices, contracts, and emails. This information can then be used to analyze data and make informed decisions. Data Scientists are responsible for analyzing data to identify trends and patterns. This course can help Data Scientists build skills in using Document AI to extract information from documents and use it to improve their data analysis processes.
Information Architect
Document AI can be used to extract information from a variety of documents, such as invoices, contracts, and emails. This information can then be used to design and build information systems. Information Architects are responsible for designing and building information systems that can store and manage data. This course can help Information Architects build skills in using Document AI to extract information from documents and use it to improve their information systems.
Business Analyst
Document AI can be used to extract information from a variety of documents, such as invoices, contracts, and emails. This information can then be used to analyze data and make informed decisions. Business Analysts are responsible for analyzing data to identify trends and patterns. This course can help Business Analysts build skills in using Document AI to extract information from documents and use it to improve their data analysis processes.
Product Manager
Document AI can be used to extract information from a variety of documents, such as invoices, contracts, and emails. This information can then be used to design and build products. Product Managers are responsible for designing and building products that meet the needs of users. This course can help Product Managers build skills in using Document AI to extract information from documents and use it to improve their product design processes.
Technical Writer
Document AI can be used to extract information from a variety of documents, such as invoices, contracts, and emails. This information can then be used to create technical documentation. Technical Writers are responsible for creating technical documentation that is accurate and easy to understand. This course can help Technical Writers build skills in using Document AI to extract information from documents and use it to improve their technical documentation.
Information Security Analyst
Document AI can be used to extract information from a variety of documents, such as invoices, contracts, and emails. This information can then be used to identify and mitigate security risks. Information Security Analysts are responsible for identifying and mitigating security risks. This course can help Information Security Analysts build skills in using Document AI to extract information from documents and use it to improve their security risk analysis processes.
Enterprise Architect
Document AI can be used to extract information from a variety of documents, such as invoices, contracts, and emails. This information can then be used to design and build enterprise architectures. Enterprise Architects are responsible for designing and building enterprise architectures that meet the needs of organizations. This course can help Enterprise Architects build skills in using Document AI to extract information from documents and use it to improve their enterprise architecture design processes.
Database Administrator
Document AI can be used to extract information from a variety of documents, such as invoices, contracts, and emails. This information can then be used to populate databases. Database Administrators are responsible for managing and maintaining databases. This course can help Database Administrators build skills in using Document AI to extract information from documents and use it to improve their database management processes.
Archivist
Document AI can be used to extract information from a variety of documents, such as invoices, contracts, and emails. This information can then be used to organize and manage archives. Archivists are responsible for organizing and managing archives. This course can help Archivists build skills in using Document AI to extract information from documents and use it to improve their archive management processes.
Knowledge Manager
Document AI can be used to extract information from a variety of documents, such as invoices, contracts, and emails. This information can then be used to create and manage knowledge bases. Knowledge Managers are responsible for creating and managing knowledge bases. This course can help Knowledge Managers build skills in using Document AI to extract information from documents and use it to improve their knowledge base management processes.
Records Manager
Document AI can be used to extract information from a variety of documents, such as invoices, contracts, and emails. This information can then be used to manage records. Records Managers are responsible for managing records. This course can help Records Managers build skills in using Document AI to extract information from documents and use it to improve their records management processes.

Reading list

We've selected nine books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Build an End-to-End Data Capture Pipeline using Document AI.
Provides a thorough introduction to natural language processing (NLP) with a focus on the latest transformer models such as BERT, GPT, and T5. It is helpful for understanding the theoretical foundations of NLP and how to apply these models in practice.
Offers a practical guide to machine learning using popular libraries like Scikit-Learn, Keras, and TensorFlow. While it doesn't cover Document AI specifically, it provides a solid foundation for understanding the concepts and techniques used in document processing.
Provides a comprehensive introduction to deep learning concepts and techniques using Python. It is useful for gaining a deeper understanding of the underlying principles of Document AI and other document processing technologies.
This classic textbook provides a comprehensive overview of speech and language processing, including topics such as text classification, natural language understanding, and speech recognition. It offers a solid foundation for understanding the theoretical underpinnings of Document AI.
Provides a comprehensive overview of information retrieval (IR) concepts and techniques. It is useful for understanding how documents are indexed, ranked, and retrieved in search engines, which is relevant to document processing pipelines.
This comprehensive textbook provides a deep understanding of database systems, including topics such as data modeling, query processing, and transaction management. It is useful for understanding the storage and management of data in document processing pipelines.
Provides a comprehensive overview of computer vision concepts and techniques, including topics such as image processing, object detection, and image recognition. It is useful for understanding the underlying principles of image analysis used in document processing pipelines.
This handbook provides a comprehensive overview of NLP concepts and techniques. It is useful as a general reference for understanding the field and exploring topics beyond the scope of the course.
Provides a comprehensive overview of deep learning techniques for NLP. It is useful for understanding how deep learning models are used in document processing pipelines.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Build an End-to-End Data Capture Pipeline using Document AI.
Process Documents with Python Using the Document AI API
Most relevant
Create and Test a Document AI Processor
Most relevant
Building Agentic RAG with LlamaIndex
Gemini for end-to-end SDLC
Awwvision: Cloud Vision API from a Kubernetes Cluster
Building Batch Pipelines in Cloud Data Fusion
Real Time Machine Learning with Cloud Dataflow and Vertex...
Getting Started with Vector Search and Embeddings
Preprocessing Unstructured Data for LLM Applications
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser