May 1, 2024
Updated June 3, 2025
26 minute read
Navigating the World of Data Pipelines: A Comprehensive Guide
Data pipelines are the essential, often unseen, infrastructure that powers the modern data-driven world. At a high level, a data pipeline is a series of interconnected data processing steps. It's a system designed to move raw data from various sources to a destination where it can be stored, analyzed, and turned into valuable insights. Think of it as an automated assembly line for data, where raw materials (data) are collected, refined (transformed), and then delivered as finished products (usable information).
6yfsps|
Find a path to becoming a Data Pipelines. Learn more at:
OpenCourser.com/topic/6yfsps/data
Reading list
We've selected four books
that we think will supplement your
learning. Use these to
develop background knowledge, enrich your coursework, and gain a
deeper understanding of the topics covered in
Data Pipelines.
This concise guide to all things data pipelines. Starting with the basics, it covers a wide range of topics, including data connectors, data integration, data quality, orchestration, and monitoring.
Practical guide to building data pipelines with Kafka, a distributed streaming platform. It covers everything from basic concepts to advanced topics like stream processing and data integration.
Teaches you how to use Flink, a popular open-source platform for building data pipelines. It covers everything from basic concepts to advanced topics like streaming and machine learning.
Teaches you how to use MongoDB, a popular NoSQL database, to build data pipelines. It covers everything from basic concepts to advanced topics like data aggregation and indexing.
For more information about how these books relate to this course, visit:
OpenCourser.com/topic/6yfsps/data