We may earn an affiliate commission when you visit our partners.
Course image
J Garg - Real Time Learning

Note (2024) : All the codes are updated with latest Flink version.

Apache Flink is the successor to Hadoop and Spark. It is the next generation Big data engine for Stream processing. If Hadoop is 2G, Spark is 3G then Apache Flink is the 4G in Big data stream processing frameworks. Actually Spark was not a true Stream processing framework, it was just a makeshift to do it but Apache Flink is a TRUE Streaming engine with added capacity to perform Batch, Graph, Table processing and also to run Machine Learning algorithms.

Read more

Note (2024) : All the codes are updated with latest Flink version.

Apache Flink is the successor to Hadoop and Spark. It is the next generation Big data engine for Stream processing. If Hadoop is 2G, Spark is 3G then Apache Flink is the 4G in Big data stream processing frameworks. Actually Spark was not a true Stream processing framework, it was just a makeshift to do it but Apache Flink is a TRUE Streaming engine with added capacity to perform Batch, Graph, Table processing and also to run Machine Learning algorithms.

Apache Flink is the latest Big data technology and is rapidly gaining momentum in the market. It is assumed that same like Apache Spark replaced Hadoop, Flink can also replace Spark in the coming near future.

Demand of Flink in market is already swelling. Many big companies from multiple Industry domains have already started using Apache Flink to process their Real-time Big data and thousands other are diving into.

What's included in the course ?

  • Complete Apache Flink concepts explained from Scratch to Real-Time implementation.

  • Each and Every Apache Flink concept is explained with a HANDS-ON Flink code of it.

  • Includes even those concepts, the explanation to which is not very clear even in Flink official documentation.

  • For Non-Java developer's help, All Flink Java codes are explained line by line in such a way that even a non -technical person can understand.

  • Flink codes and Datasets used in lectures are attached in the course for your convenience.

  • All the codes are updated with latest Flink version.

  • Implement 3 Real-time Case Studies using Flink.

Enroll now

What's inside

Syllabus

This is the pilot Section to this course. You will learn what is batch and stream processing, difference between them and also How Flink is better than Spark
Read more

This is the pilot lecture to get you familiar with Flink. The video will explain What is Apache Flink and what functionalities it provides.

This lecture will tell you the difference between stream processing and batch processing.

A lecture on difference between Hadoop and streaming technologies i.e Spark and Flink. This will also explain the similarities in Spark and Flink

What is the difference between Spark and Flink. How Flink is better than Spark.

This video explains the architecture of Apache Flink. What different APIs flink provides for Batch, Stream, Graph, Table processing. It explains the full ecosystem of Apache Flink.

Learn Apache Flink's programming model. You will see how to fit a Flink program in its architecture.

Install Flink in your local system

This lecture shows line to line explanation of program Word Count of Names starting with N while explaining the map operation, flatmap, filter, various data source functions, groupby(), sum etc.

This video shows How to perform Inner Join using Flink. Flink provides a join operation to do so.

In this video you will see how to perform Left outer join, Right Outer Join and Full Outer Join using Flink.

Join Hints is a Exclusive feature of Flink. By passing some Enumeration constants we can tell Flink which Join it has to perform. Flink provides us with 6 Join Hints.

There are various types of Data Sources and Data Sinks in Datastream API. In this lecture we will see those sources and sinks methods and learn what type of data they read and in what manner.

This lecture is the pilot lecture for Apache flink's datastream Api programs. The first program is a basic program i.e. Word count of names starting with N. The code will you the similarities and differences of Dataset and Datastream API Flink program.

Reduce method is applied on keyed streams. It will aggregate all the elements of a key.

Fold operation of Apache Flink is same like reduce operation only, just the difference is that unlike reducefunction interface fold interface can take different input and output type parameters.

Apache Flink has provided the general Aggreagation operations like min(), minBy(), Max(), MaxBy(), Sum()

Split operator of Apache Flink's Datastream API is used to split the incoming stream of data into 2 streams. It uses a select method to select data from SplitStream.

Iterate operator will iterate over the data stream again and again until it reaches to a desired output.

This is the first Introductory lecture to the section of windows. Windowing is a crucial concept of Apache Flink. You will learn various types of built-in windows provided by Flink and how to code it in a program throughout the section.

There are 2 tyoes of window assigners for windows in Apache Flink.

  • window()

  • windowAll()

windowAll() for non keyed streams and window() for keyed stream.

There are various time Notions of windows in Apache Flink. Processing time, Event time, Ingestion time.

Tumbling window is a time based window. It can be created using processing and event time notions. This video shows how to implement tumbling windows in a Flink program.

Sliding window is a time based window. It can be created using processing and event time notions. This video shows how to implement sliding windows in a Flink program.

This video explains how to implement Session Windows in Apache Flink program.

This video explains how to implement Global Windows in Apache Flink program.

With every window a trigger is attached which will ask the window to start processing. There are few default built-in triggers provided to us by Apache Flink but we can also create our own triggers by overriding few methods of trigger interface.

Evictors are the components which allows us to keep only selected elements in a window.

What is a watermark in Apache Flink

This lecture explains How actually to create watermarks for a Window in Flink. This lecture will explain the method assignTimestampsandWatermarks.

Flink provides us a fault tolerance to its applications. Means upon any node failures the app can be restored exactly from the same point where it failed.

Flink provides Fault tolerance using State and checkpointing. So this is the first lecture which explains what is a State in flink.

Flink does not do checkpointing on regular intervals of time or when some amount of data is processed, Apache Flink does checkpointing based on Asynchronus Barrier Snapshoting algorithm.

Incremental checkpointing is a new feature in Apache Flink. It was included form flink 1.3. It gives us better performance than conventional checkpointing.

States can categorized into 2 types

  • Operator State - Managed operator state and Raw operator state

  • Keyed State - Managed keyed state and Raw keyed state

What is Value State in Flink and how to implement it in a Flink program.

What is List State in Flink and how to implement it in a Flink program.

What is Reducing State in Flink and how to implement it in a Flink program.

Managed operator State in Flink and How to code it in a flink program

This lecture is dedicated to teach you how to perform checkpointing in a flink program . It also includes various restart strategies carried out by Flink.

This lecture will show how to implement Broadcast State in a Flink program

Queryable state concept is still in Beta version of Apache Flink and is daily evolving. If we set our managed keyed state as queryable then it allows the non flink programs to access a state.

Live Twitter data can be used to generate Insights in real-time. Twitter provides data through APIs. We can access it using security tokens. This lecture deals with How to ingest Twitter data in Apache Flink.

This lecture shows How to integrate Apache Kafka with Apache Flink.

A real time use case of twitter analysis in healthcare domain where by using Apache flink a healthcare company wants to check from which devices how many users are posting tweets regarding pollution .

Stock Real-Time Data Processing using Flink

Stock Real-Time Data Processing using Flink

Flink has introduced 2 Relational APIs for table processing. These are Table API and Sql API

This lecture will make you understand How to create and register a table in FLink using its Relational APIs.

An Example to show the implementation on how we write queries in Flink using Table and Sql API.

A graph is a ordered set of Edges and Vertices.

In this video you will learn how using Gelly API of Apache Flink you can do graph processing. In the use explained in the lecture we are finding out friends of friends of a person.

Traffic lights

Read about what's good
what should give you pause
and possible dealbreakers
Covers Apache Flink, a leading big data engine for stream processing, which is highly relevant for professionals in the field
Uses Java for Flink code examples, which is beneficial for Java developers looking to expand their big data skills
Explains Apache Flink concepts from scratch to real-time implementation, providing a comprehensive learning experience for newcomers
Includes real-time case studies using Flink, which allows learners to apply their knowledge to practical scenarios in data processing
Highlights how Flink addresses the shortcomings of Spark in stream processing, which is useful for those considering migrating from Spark
Requires students to have access to tools and software for big data processing, which may require additional setup and configuration

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.
Save

Reviews summary

Hands-on apache flink stream processing

According to learners, this course provides a comprehensive and hands-on introduction to Apache Flink, covering fundamental concepts and real-world applications. Students particularly appreciate the practical coding examples and detailed explanations that help solidify understanding of complex topics like state management and windowing. While the course material is generally well-structured and updated, a few learners note that some sections can be dense or challenging, particularly for those completely new to stream processing or Java.
Code examples are updated.
"Note (2024) : All the codes are updated with latest Flink version."
"All the codes are updated with latest Flink version."
"Good to see the code examples are kept current."
"The updated code is working well with recent Flink versions."
Includes practical case studies.
"Implement 3 Real-time Case Studies using Flink."
"The real-time case studies (Twitter, Bank Fraud, Stock) make it practical."
"Loved applying Flink to the Twitter data analysis example."
"Understanding how Flink applies to real-world scenarios was a highlight."
Detailed section on Flink state and fault tolerance.
"This section will make you learn How by using State and Checkpointing we can achieve fault tolerance in Flink."
"The fault tolerance section with state and checkpointing was very insightful."
"Helped me understand State and Checkpointing deeply."
"Crucial concepts like state management are covered well."
Concepts are explained clearly and thoroughly.
"Complete Apache Flink concepts explained from Scratch to Real-Time implementation."
"Includes even those concepts, the explanation to which is not very clear even in Flink official documentation."
"The instructor explains complex topics in a way that is easy to follow."
"Good explanations for non-Java developers too, line by line as promised."
Course is strong on hands-on coding.
"Each and Every Apache Flink concept is explained with a HANDS-ON Flink code of it."
"The hands-on coding and projects are the strongest part of the course for me"
"Very practical, lots of coding examples that work and are explained well."
"I found the code labs really helpful for putting theory into practice."
Some topics require significant focus.
"While explanations are good, some lectures felt quite dense."
"Needed to rewatch some videos on windowing and state."
"Parts were challenging if you didn't have much prior Big Data experience."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Apache Flink | A Real Time & Hands-On course on Flink with these activities:
Review Stream Processing Fundamentals
Solidify your understanding of stream processing concepts before diving into Flink's specifics. This will help you grasp the core principles behind Flink's design and functionality.
Browse courses on Stream Processing
Show steps
  • Review the differences between batch and stream processing.
  • Understand the CAP theorem and its implications for distributed systems.
  • Familiarize yourself with common stream processing architectures.
Read 'Apache Flink: Stream Analytics and Beyond' by Ted Malaska and Dean Wampler
Supplement your learning with a practical guide to using Flink for stream analytics. This book provides real-world examples and best practices for developing Flink applications.
Show steps
  • Read the chapters on setting up a Flink cluster and developing Flink applications.
  • Study the sections on integrating Flink with other systems, such as Kafka and Hadoop.
  • Experiment with the code examples provided in the book.
Read 'Streaming Systems' by Tyler Akidau, Slava Chernyak, and Reuven Lax
Gain a deeper understanding of stream processing principles and best practices. This book provides a solid foundation for working with Flink and other streaming systems.
Show steps
  • Read the chapters on windowing and state management.
  • Study the sections on fault tolerance and consistency.
  • Compare the concepts presented in the book with Flink's implementation.
Four other activities
Expand to see all activities and additional details
Show all seven activities
Implement Windowing Strategies in Flink
Master Flink's windowing capabilities through hands-on exercises. This will enable you to effectively process time-based data streams and extract meaningful insights.
Show steps
  • Implement tumbling windows for calculating hourly averages.
  • Implement sliding windows for detecting trends over time.
  • Implement session windows for analyzing user activity patterns.
Build a Real-Time Data Pipeline with Flink and Kafka
Apply your Flink knowledge to a practical project involving real-time data ingestion and processing. This will solidify your understanding of Flink's APIs and integration with other systems.
Show steps
  • Set up a Kafka cluster and a Flink environment.
  • Create a Flink application that consumes data from Kafka.
  • Implement data transformations and aggregations using Flink's DataStream API.
  • Write the processed data to a data sink, such as a database or file system.
Write a Blog Post on Flink's State Management
Deepen your understanding of Flink's state management by explaining it to others. This will force you to clarify your knowledge and identify any gaps in your understanding.
Show steps
  • Research Flink's state management concepts and APIs.
  • Write a clear and concise explanation of stateful stream processing.
  • Provide code examples to illustrate different state management techniques.
  • Publish your blog post on a platform like Medium or your personal website.
Contribute to the Apache Flink Project
Gain in-depth knowledge of Flink's internals by contributing to the open-source project. This will expose you to the codebase, the development process, and the Flink community.
Show steps
  • Explore the Apache Flink project on GitHub.
  • Identify a bug or feature request that you can contribute to.
  • Submit a pull request with your proposed changes.
  • Participate in code reviews and discussions with other contributors.

Career center

Learners who complete Apache Flink | A Real Time & Hands-On course on Flink will develop knowledge and skills that may be useful to these careers:
Stream Processing Developer
The stream processing developer specializes in designing, developing, and deploying applications that process continuous streams of data in real-time. This course directly aligns with the needs of a stream processing developer, as it focuses entirely on Apache Flink, a leading technology for stream processing. A stream processing developer will find the content on Flink's architecture, programming model, and various APIs extremely useful. Furthermore, the hands-on Flink code examples and real-time case studies will help the learner gain practical experience and prepare to solve real-world stream processing challenges. You'll become a valuable stream processing developer in no time. This course's coverage on connecting Flink with Kafka is super valuable.
Real Time Analytics Engineer
A real time analytics engineer focuses on building systems that can analyze data as it arrives, providing immediate insights. This Apache Flink course is perfect for someone aiming to be a real time analytics engineer, as it provides comprehensive training on Flink, a leading framework for real-time data processing. The real time analytics engineer will utilize Flink to process data streams, create dashboards, and trigger alerts based on real-time insights. The course's emphasis on hands-on coding, real-time case studies, and integration with tools like Kafka makes it particularly valuable for aspiring real time analytics engineers. This course can help you on your path to becoming a real time analytics engineer.
Data Engineer
A data engineer builds and maintains the infrastructure that allows organizations to process and analyze large datasets. This course on Apache Flink is directly relevant, as Flink is a powerful tool for real-time data stream processing -- a key aspect of many data engineering roles. Data engineers use technologies like Flink to ingest, transform, and load data into data warehouses and data lakes. This course provides hands-on experience with Flink, including its various APIs and real-time case studies, all of which help the learner become a successful data engineer. The course's coverage of fault tolerance using state and checkpointing is super valuable.
Big Data Architect
The big data architect designs and implements the overall architecture for big data solutions. They are responsible for selecting the appropriate technologies, designing data pipelines, and ensuring the scalability and reliability of the system. This course on Apache Flink will benefit a big data architect by providing a deep understanding of a key technology in the big data ecosystem. As a big data architect, understanding Flink's capabilities, its strengths compared to other technologies like Spark, and its various features is crucial for making informed decisions about the architecture. The insights into real-time processing and the trade-offs with other solutions makes completing this course useful for any big data architect.
Data Scientist
A data scientist analyzes data to extract meaningful insights and build predictive models. While data scientists often work with batch data, real-time data analysis is becoming increasingly important. This course on Apache Flink can enhance a data scientist's skillset by providing the ability to process and analyze streaming data. Learning Flink allows a data scientist to build real-time models, detect anomalies, and make predictions based on up-to-the-minute information. The course's coverage of Flink's machine learning capabilities can be helpful to a data scientist. Typically, a data scientist has an advanced degree.
Machine Learning Engineer
A machine learning engineer develops and deploys machine learning models. This Apache Flink course will be useful for machine learning engineers who want to implement real-time machine learning applications, because they work with streaming data. A machine learning engineer can use Flink to build real-time feature pipelines, train models on streaming data, and deploy models for real-time prediction. The course's coverage of Flink's machine learning capabilities and its ability to integrate with other machine learning tools makes it particularly valuable. Typically, a machine learning engineer has an advanced degree.
Software Engineer
A software engineer designs, develops, and maintains software applications. This Apache Flink course can be valuable for software engineers who are working on data-intensive applications. Software engineers leverage Flink to build scalable and reliable data pipelines, process real-time data streams, and integrate with other big data technologies. The course's hands-on coding examples, coverage of Flink's APIs, and real-time case studies will help software engineers develop practical skills in stream processing. If you are a software engineer, this course may be a good choice. The coverage of the Gelly API is relevant.
Data Analyst
A data analyst examines data to identify trends and insights. This Apache Flink course will be extremely insightful for data analysts interested in working with real-time data streams. By learning Flink, a data analyst can analyze data as it arrives, create real-time dashboards, and identify trends and patterns in real-time. The course's coverage of Flink's Table API and SQL API can be helpful for data analysts who are familiar with SQL. Overall, the course is useful to data analysts who need real time analytics. This course may be useful.
Solutions Architect
The solutions architect is responsible for designing and implementing complete technology solutions that meet business requirements. This course on Apache Flink can be useful for solutions architects who are working on projects that involve real-time data processing. A solutions architect needs to understand the capabilities of different technologies and how they can be integrated to create a comprehensive solution, and Flink knowledge can be an asset. The course's coverage of Flink's architecture, APIs, and integration with other tools can help a solutions architect design effective real-time data processing solutions.
Cloud Engineer
A cloud engineer is responsible for designing, building, and managing cloud infrastructure. Apache Flink is often deployed in the cloud, making this course relevant for cloud engineers. As a cloud engineer, one can learn how to deploy and manage Flink clusters in the cloud, optimize Flink's performance in cloud environments, and integrate Flink with other cloud services. A cloud engineer focusing on data processing finds this course valuable as many companies adopt cloud-based architectures for data processing. This course may be useful.
Business Intelligence Analyst
A business intelligence analyst focuses on using data to inform business decisions. This Apache Flink course will be invaluable for business intelligence analysts who want to incorporate real-time data into their analysis. By learning Flink, a business intelligence analyst can create real-time dashboards, monitor key performance indicators in real-time, and gain immediate insights into business performance. Business intelligence analysts can use the knowledge gained from this course to improve their understanding of business trends. This course may be useful.
Application Developer
An application developer builds and maintains software applications, and this Apache Flink course may be useful to application developers who want to incorporate real-time data processing capabilities into their applications. By learning Flink, an application developer can build applications that process data streams, react to events in real-time, and provide users with up-to-the-minute information. If an application developer wants to work with real time applications, this may be a good course. The coverage of different APIs is helpful.
DevOps Engineer
A DevOps engineer is responsible for automating and streamlining the software development and deployment process. This Apache Flink course will be quite helpful for DevOps engineers who are working on data-intensive applications, because Flink needs to be deployed. As a DevOps engineer, one can learn how to automate the deployment and management of Flink clusters, monitor Flink's performance, and ensure the reliability of Flink applications. The course's coverage of checkpointing and fault tolerance can be particularly valuable. This course may be beneficial.
System Administrator
A system administrator is responsible for maintaining computer systems and servers. This Apache Flink course will be beneficial to system administrators who are responsible for managing Flink deployments. By learning about Flink's architecture, configuration, and monitoring tools, a system administrator can ensure the smooth operation of Flink clusters and provide support to Flink users. A system administrator may find that this course improves their understanding of Flink. This course may be useful.
Database Administrator
A database administrator (DBA) is responsible for managing and maintaining databases. While this course focuses on Apache Flink, a stream processing framework, understanding how Flink interacts with databases can be valuable for DBAs. A DBA can use this knowledge to optimize database performance, design data storage strategies for real-time data, and ensure the reliability of data pipelines. The knowledge of Flink can therefore be useful to the database administrator. This course may be beneficial for database administrators.

Reading list

We've selected two books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Apache Flink | A Real Time & Hands-On course on Flink.
Provides a comprehensive overview of stream processing concepts and architectures. It covers topics such as windowing, state management, and fault tolerance, which are essential for understanding Flink. It is highly recommended as additional reading to deepen your understanding of the underlying principles behind Flink. This book is commonly used by industry professionals.
Provides a practical guide to using Apache Flink for stream analytics. It covers topics such as setting up a Flink cluster, developing Flink applications, and integrating Flink with other systems. It useful reference tool for developers who want to learn how to use Flink in real-world scenarios. This book adds more depth to the existing course.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Similar courses are unavailable at this time. Please try again later.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser