We may earn an affiliate commission when you visit our partners.

Streaming Data

Save
May 1, 2024 Updated May 11, 2025 26 minute read

Streaming data, at its core, refers to data that is generated continuously by thousands of data sources, which typically send the data records in simultaneously. This continuous flow of information contrasts with traditional batch data processing, where data is collected over a period and then processed in large chunks. Think of streaming data like a river, constantly flowing and changing, as opposed to a lake, which is a large, static body of water. This real-time or near real-time nature of streaming data is its defining characteristic, enabling immediate analysis and action.

Path to Streaming Data

Take the first step.
We've curated 24 courses to help you on your path to Streaming Data. Use these to develop your skills, build background knowledge, and put what you learn to practice.
Sorted from most relevant to least relevant:

Share

Help others find this page about Streaming Data: by sharing it with your friends and followers:

Reading list

We've selected 25 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Streaming Data.
Save
Comprehensive guide specifically focused on streaming data processing. It delves into the core principles, concepts like watermarks and exactly-once processing, and compares streaming and batch processing patterns. It is considered a reference for understanding large-scale streaming data systems, though it has a focus on Apache Beam.
Given the prevalence of Kafka in streaming data architectures, this book is essential for understanding its fundamentals, design principles, and practical applications. It covers producers, consumers, and building data pipelines with Kafka. This must-read for anyone working with or planning to use Kafka for streaming data.
While not exclusively about streaming data, this book provides foundational knowledge on data systems, including concepts crucial for understanding streaming data architectures. It covers distributed systems, data models, and trade-offs in system design, making it highly valuable as background reading. It is widely regarded as a must-read for anyone working with data systems.
Apache Flink major player in stream processing. provides a deep dive into Flink's architecture, APIs, and concepts like state management and event time processing. It is highly relevant for those focusing on building scalable streaming applications with Flink.
Provides a theoretical foundation for streaming data analysis. It covers topics such as data models, algorithms, and applications. It good resource for researchers and practitioners who want to learn about the theoretical foundations of streaming data analytics.
Provides a comprehensive overview of Apache Kafka, a popular streaming data platform. It covers topics such as architecture, deployment, and operations. It good resource for developers and engineers who want to learn how to use Kafka to build and deploy streaming data applications.
Provides a comprehensive overview of Apache Flink, a popular streaming data platform. It covers topics such as architecture, deployment, and operations. It good resource for developers and engineers who want to learn how to use Flink to build and deploy streaming data applications.
For those specifically working with Kafka Streams, this book provides practical examples and guidance on building streaming applications directly on Kafka. It's valuable for deepening understanding of stream processing patterns within the Kafka ecosystem.
Specifically addresses building real-time analytics systems using Kafka and Pinot. It's highly relevant for understanding how to derive insights from streaming data in real-time applications.
Kafka Connect crucial component for integrating Kafka with other systems for building data pipelines. provides practical guidance on using Kafka Connect, which is highly relevant for implementing streaming data ingestion and delivery.
Offers a good introduction to the concepts and requirements of streaming and real-time data systems. It provides an overview of building streaming pipelines and the roles of various technologies. It serves as a solid starting point for gaining a broad understanding.
Based on influential blog posts, this book provides foundational ideas behind using logs as a central abstraction for data systems, including streaming. It's a classic perspective on the underlying principles.
While covering both batch and streaming, this book provides a comprehensive guide to Apache Spark, including Spark Streaming and Structured Streaming. It's a key resource for understanding how to process large-scale data, including streaming data, using Spark.
Focuses on using Apache Spark for building streaming applications, providing a hands-on approach. It's valuable for those looking to apply Spark specifically to streaming data engineering tasks.
Aims to provide a clear and intuitive understanding of real-time event processing and streaming systems. It likely covers core concepts in an accessible way, making it suitable for those new to the topic.
Offers a practical approach to getting started with Apache Kafka for building real-time data streaming pipelines. It's suitable for those who want a hands-on introduction to Kafka for data streaming.
Focuses on the analytical aspects of streaming data, covering techniques for analyzing and visualizing data in motion. It provides practical examples and highlights the components of streaming data systems relevant to analytics.
Provides a broad overview of data engineering principles, which are highly relevant to building streaming data systems. It covers various aspects of data infrastructure and would be useful for a holistic understanding of the field.
Covers advanced analytics topics such as streaming, machine learning, and graph processing. It good resource for data scientists and engineers who want to learn how to use Spark to build and deploy advanced analytics applications.
Focusing on the intersection of streaming data and the data mesh concept, this book explores how to optimize real-time data services within a data mesh architecture. It's a contemporary topic relevant for advanced practitioners.
This concise book provides a primer on performing analytics on event streams, covering algorithms and techniques for streaming analysis. It's useful for gaining a focused understanding of the analytical aspects of streaming data.
For those working with Scala in the context of data engineering and Spark, this book provides guidance on building both streaming and batch pipelines. It's relevant for understanding language-specific implementations in streaming.
Data Mesh contemporary topic in data architecture. While not solely focused on streaming, this book discusses how streaming data fits into a decentralized data mesh paradigm. It's relevant for understanding modern data architectural trends.
Introduces the Lambda Architecture, a pattern for building scalable and fault-tolerant data processing systems that combines batch and real-time processing. While newer architectures exist, understanding Lambda Architecture provides historical context for streaming systems.
Table of Contents
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser