We may earn an affiliate commission when you visit our partners.
Course image
Aref Karimi

Are you looking to enhance your experience with observability using the Grafana Stack? Dive into our acclaimed Grafana and Prometheus tutorial, which covers the critical components of the Grafana Stack, such as Grafana Loki, Grafana Alloy, and Grafana Tempo.

The course begins with a section about observability, telemetry, metrics and various metric collection approaches. This information helps you strengthen your knowledge of the core concepts of observability.

Read more

Are you looking to enhance your experience with observability using the Grafana Stack? Dive into our acclaimed Grafana and Prometheus tutorial, which covers the critical components of the Grafana Stack, such as Grafana Loki, Grafana Alloy, and Grafana Tempo.

The course begins with a section about observability, telemetry, metrics and various metric collection approaches. This information helps you strengthen your knowledge of the core concepts of observability.

Afterwards, this course embeds a complete course on Prometheus, allowing you to deploy, configure, and use Prometheus and its rich features like a professional.

The following section concerns deploying Grafana in various environments using different methods. You will see how to install Grafan on Windows, Mac, Linux (multiple flavours), and with Docker.

Once your Prometheus and Grafana are deployed and ready, you will learn about the best dashboard design practices for browser applications, backend applications, microservices, and infrastructure. Then, you will learn to create dashboards and graphs in Grafana that leverage the power of Prometheus functions. The course also includes instructions on integrating My

After querying the data and visualising them on Grafana, you want to create Alert Rules and raise notifications when the Alert Rules are violated. The notifications must be directed to suitable channels, such as Slack, to ensure proactive monitoring. The course also includes a section about alerts and notifications.

Producing, collecting, and visualising logs is crucial to any observability platform. That is why there is a section for Grafana Loki, Grafana's log collection and visualisation software.

Opentelemetry has gained traction and has been significantly adopted in recent years. Continuing our learning journey, we will learn about Opentelemetry (OTel), Opentelemetry Protocol (OTLP), and Grafana Alloy. We will work with a microservice that produces and exports Otel signals (i.e., metrics and traces) using Opentelemetry SDKs.

The Grafana Alloy tutorial in this course explores Grafana Labs' latest addition to the Grafana stack and its role in collecting, processing, and exporting Opentelemetry signals.

After having Grafana Alloy and Opentelemetry down the path, we will learn about Opentelemetry Traces and Grafana Tempo, Grafana Lab's solution for visualising Opentelemetry Traces.

The course is based on an imaginary online company called ShoeHub, which sells shoes in multiple countries. The course, therefore, has accompanying code/software that is provided on GitHub to cover the following:

  • Mock data generation for ShoeHub company.

  • Docker build files for custom Grafana images.

  • Docker composes files for launching Grafana, Prometheus, Loki and Tempo in one go.

  • A Python script for (mock) Log generation for Grafana Loki.

  • Installation procedures for Ubuntu and Amazon Linux.

  • Microservice  (C# and Pythong) with custom Opentelemetry instrumentation.

  • Linux shell scripts for deploying Grafana Alloy.

This course was first published in 2018, and it's been updated and revamped steadily ever since. To keep your knowledge current, you will receive periodic educational communications about updates and additions to the course.

I will respond quickly via Udemy's Q&A feature if you encounter any issues or questions.

Happy learning :-)

Enroll now

What's inside

Learning objectives

  • Fundamentals of observability (types of telemetry data, metric collection methods etc.)
  • Prometheus (installation, configuration and usage) comprising 21 lectures.
  • Installation of grafana on windows, mac, linux (multiple flavours) and with docker.
  • Architecture of highly available and highly scalable grafana for produciton use.
  • Dashboard design best practices (browser apps, backend apps and infrastructure)
  • Building dashboards and graphs in grafana
  • Creating and managing alerts and notifications in grafana
  • Integration with mysql, sql server, aws cloudwatch, gcp etc.
  • Grafana loki: retrieval and visualisation of logs
  • Administration of grafana (users, teams, oauth integraiton, ldap integration etc.)
  • Opentelemetry
  • Grafana alloy
  • Grafana tempo
  • Show more
  • Show less

Syllabus

Introduction
Foundations of Observability
Evolution of Software Architecture and Observability
What is Monitoring
Read more

Let's compare the pros and cons of installing Grafana locally versus using the cloud-based Grafana.

You will learn how to install and configure Grafana on Ubuntu LTS 18.04 ( and above ). The step by step instructions of setting up Grafana is attached to this lecture as well.

Windows is the most popular operating system for servers and personal computers. Therefore it is essential to know that how Grafana can be installed and configured on a Windows instance.

If you are a proud Mac user, you can install Grafana directly on your Mac computer and use it to learn more about it. In this lecture you will learn that how you can install and configure Grafana using Homebrew.

A quick and easy way of installing Grafana is using its Docker image. In this lecture you will see that how you can use Grafana's docker image to quickly setup your observability stack.

Dashboards in Grafana are designed for different purposes, such as monitoring browser applications or infrastructure. Each dashboard type is used by a different role or team in the organisation, who may have different KPIs to watch.

In this lecture I will explain the most common dashboard layouts and structure for each dashboard type.

The Shoe Hub is an imaginary company we will use throughout the course to explain how you can visualise business and technical metrics.

Graph panel is suitable for creating charts and histograms. In this lecture you will learn how to use Graph panel and display the metrics from Graphite on it.

In this lecture you will visualise the data of different payment methods in the US so that we can have a good understanding how the customers prefer to pay.

Using the Data Transformations feature of Grafana, you can mix and match existing panel rows to create new rows, look up data or convert data types.

The Time Series panel is suitable for showing the data trend over time.  However, ,we can compare different related values in percentage form using Pie Charts. For example, we can show the percentage of infrastructure failures are related to disk, what percentage is related to network and what percentage is related to power outage.

Sometimes, we want to compare a metric's current value(s) to the values(s) of the same metric but in the past. For example, you could display the current Shoe sales compared to last month's sales or make a week-on-week revenue comparison. Such graphs can be used to understand of the state of a metric easilywhether a metric's state is increasing or decreasing. For example we can see if the network errors have gone down since last week, or if our marketing efforts have paid off and our sales has gone up since last month. In this lecture we will learn that how we can do this using Grafana and Prometheus.

Sometimes it is essential for us to know if the values of data points are above or below a given threshold. For example, if the network errors go above a certain number, or if the orders received per hour are unusually low. We achieve this by Thresholds in Grafana.

Variables, a key feature of Grafana, allow us to create dynamic dashboards and panels with less work and effort than when we hard-code everything.

In Grafana, if we show two or more lines on a Graph panel, and the values of these lines are vastly different, then one or some of those lines may become so compressed that we may see their data points as zeros. For example of we show the response time of an IoT device that responds slowly, and the response time of an API that responds very quickly, on the same Graph panel, the response time of the API may be seem as a straight line with value of zero.

In this lecture we will learn that how we can overcome this problem.

Alerts are defined based on thresholds or mathematical formulations in Grafana. Over time, the alerting system in Grafana has evolved, improved and become somewhat complex. In this lecture you will learn about the concepts and terminalogies of the Grafana Alerting System, and you will learn how this ecosystem works.

Alerts in Grafana are based on queries written in a data source-specific language, such as PromQL for Prometheus. The results of these queries are checked periodically, and if they violate a rule, such as a threshold, that we define, alerts are raised.

In this lecture you will learnt that how you define an alerting rule.

It is not practical to constantly watch the dashboards to see if alerts are raised.  Instead, we deliver notifications in various formats, such as emails or Slack messages, to inform relevant people of the alert.

In this lecture you will learn that how you create contact points as well as notification policies to filter the notifications and direct them to the right people.

Slack is a popular collaboration tool that many teams use to chat, exchange team data and receive notifications. Grafana can send alert messages to Slack, too. In this lecture, you will learn how to send Slack notifications for a firing alert.

Sometimes, we do not want to send out notifications temporarily. For example, you may not want to send out notifications at midnight. In such cases we can use Grafana's ability to silence the alerts based on a define time period.

Annotations are a way to describe the rich events. In this lecture you will see that how you can use annotations to describe and understand your Grafana panels better.

MySQL Is a very common database and it makes sense to use MySql when your data and metrics already reside in your MySQL database. This lecture will show you how to use MySQL as your data source.

If you have deployed your systems to Amazon Web Services (AWS) you can connect Grafana to AWS's metric service called Amazon CloudWatch, and visualise the metrics of your AWS resources in Grafana, without having to move those metrics to a time-series database such as Prometheus.

With Grafana and GCP's monitoring API enabled, you can monitor your Google Cloud resources efficiently, without moving their metrics to a time series database such as Prometheus. In this lecture you will learn that how you can leverage the out-of-the-box dashboard of Grafana to setup your observability system in a few minutes.

Organisations are great for giving a good shape to your observability platform so that it stays organised and well managed as it grows as it grows as it grows. In this lecture you will learn how you can work with the Organisations feature of grafana and administer teams and users.

One way of authenticating external users is OAuth. Google is a major identity provider and reliable, too. Many companies use Google Suite to manage their users and identities. These companies would like to authenticate their Grafana users against Google. In this lecture, we learn how external users can be authenticated using an OAuth provider such as Google.

Many companies use Active Directory or other LDAP-compatible directory services to manage their users, so they would prefer to authenticate their Grafana users with the existing directory users, too.

This video will teach us how to configure Grafana to authenticate users against a given Directory Service, such as Microsoft Active Directory.

You can extend the capability of your dashboards by using Plugins. This lecture will show you how you can setup plugins and use them.

When you deploy Grafana in a Production capacity, you must ensure that Grafana will be highly available and that a failure in part of your deployment will not take down Grafana or make it unavailable.
In this lecture, you will learn about the architecture of a highly available graffiti.

When Grafana is deployed in a heavily used Production environment, you must take measures to ensure that your deployment is scalable and can cope with increased load.
In this lecture we will upgrade our HA deployment of Grafana to a HA & Scalable deployment.

With the advent of cloud-native applications and the microservices architecture, commercial observability platforms gained attention and became famous. However, they can be costly, and once integrated with them, it may be pretty challenging to break away from them and adopt a different vendor's observability platform.
OpenTelemetry, or OTel, is an open-source initiative incubated by the Cloud Native Computing Foundation (CNCF) that aims to enable developers and DevOps engineers to generate, export, and collect telemetry data without being locked into a specific vendor.

Learn about the architecture of a scalable observability system based on Opentelemetry.

Learn about configuring Prometheus to receive Opentelemetry metrics.

Grafana Alloy is Grafana's Opentelemetry Collector. It can receive OTel metrics from various sources and deliver them to a variety of backend databases after processing them.

In this lecture, we will install Grafana Alloy locally on a Mac computer. Installation instructions for installing Grafana Alloy on Windows and Linux are provided at the end of this section.

Grafana Alloy plays a pivotal role in receiving, processing, and forwarding Opentelemetry signals to downstream systems, such as Prometheus. In this lecture, you will learn how to create receivers, processors, and exporters to achieve this goal.

In this lecture, we will analyse a microservice that produces a counter and exports it to Grafana Alloy via OTLP.

Tracing in distributed systems, particularly within a microservices architecture, is crucial for understanding and optimizing system performance and reliability. Tracing becomes complex yet essential as modern applications are built using microservices, where various components communicate over networks.

Tracing involves tracking a request's journey across multiple microservices, providing insights into each service's performance, dependencies, and bottlenecks. Distributed tracing tools like Jaeger and Zipkin enable developers to visualize this journey, often represented as a trace or a series of interconnected spans.

In microservices, where each service is responsible for a specific function, tracing helps identify latency issues, failures, and inefficiencies that may occur at any point in the system. By correlating traces across services, developers can pinpoint the root cause of problems and optimize performance.

Moreover, tracing facilitates debugging and monitoring in production environments, aiding in troubleshooting and ensuring system reliability. It also supports distributed system testing, allowing developers to simulate various scenarios and analyze system behaviour under different conditions.

In this lecture you will learn all aspects of Telemetry and its relevance to Grafana.

Traffic lights

Read about what's good
what should give you pause
and possible dealbreakers
Develops knowledge and skills in Grafana and related tools, which are core skills for DevOps engineers, site reliability engineers, and data analysts
Taught by Aref Karimi, who has extensive experience in observability and software development
Explores observability and telemetry, which are standard in the IT industry
Covers a range of topics, from Prometheus to Opentelemetry, providing a comprehensive understanding of observability
Provides hands-on labs and interactive materials, fostering practical skills development
Offers integration with cloud services like AWS and GCP, making it relevant to modern cloud-based systems

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.
Save

Reviews summary

Comprehensive grafana stack observability guide

According to learners, this course offers a comprehensive and highly practical dive into observability, covering the entire Grafana Stack including Prometheus, Loki, Alloy, and Tempo. Many highlight the hands-on labs and real-world focus, enabling them to build robust monitoring solutions for production environments. The instructor is praised for consistently providing up-to-date content and offering responsive Q&A support. While it provides a solid foundation for beginners, some advanced users occasionally desired deeper dives into certain niche topics, suggesting a varied pacing. Overall, students find it an invaluable resource for mastering modern observability practices.
Some parts are excellent for beginners, while advanced users might seek more depth.
"Some sections felt a bit rushed for experienced users already familiar with basics, but the foundational parts are great for beginners."
"While it covers a lot, I sometimes wished for a deeper dive into specific complex topics or advanced troubleshooting scenarios."
"I found the initial setup detailed enough, but for some advanced production scenarios, I felt I needed additional external resources."
Instructor is very active in Q&A, providing timely and helpful assistance.
"The instructor's quick responses via Q&A were incredibly helpful whenever I ran into an issue."
"It's reassuring to know that support is readily available when questions arise during the course."
"I got all my queries answered promptly, which made the learning process much smoother."
Course content is kept current with the latest versions and tools like Grafana Alloy.
"It's great that the course is consistently updated, especially with new tools like Grafana Alloy and OpenTelemetry."
"The content feels very current, which is vital in such a fast-evolving field like observability."
"I noticed how the instructor incorporates the latest features, ensuring the knowledge is always relevant and useful."
Emphasizes practical application with real-world scenarios and provided code.
"The hands-on labs with the ShoeHub company are incredibly useful and helped me solidify my understanding."
"I learned so much by actively configuring and deploying the stack, which is exactly what I needed for my job."
"The provided code and Docker files made it easy to follow along and get everything running quickly."
A very broad and detailed overview of the entire Grafana observability stack.
"This course covers the full spectrum of Grafana, Prometheus, Loki, Alloy, and Tempo – truly a one-stop shop for observability."
"I appreciate how it integrates all the critical components of the Grafana Stack, from metrics to logs and traces seamlessly."
"It's a fantastic course that delves into every aspect of setting up and using a modern observability platform."
Some learners faced minor challenges with the initial environment setup.
"While the provided Docker files help, getting the entire ShoeHub environment working sometimes required extra effort."
"I encountered a few dependency issues during the initial setup, which took some time to resolve."
"Even with the instructions, ensuring all services communicate perfectly can be a small hurdle."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Observability with Grafana, Prometheus,Loki, Alloy and Tempo with these activities:
Create a resource list for Grafana monitoring
Compile a list of useful resources to support your Grafana monitoring efforts.
Browse courses on Grafana
Show steps
  • Search for Grafana monitoring resources online.
  • Evaluate the relevance and quality of the resources.
  • Organize the resources into a list or document.
Join a Grafana study group
Collaborate with other students to enhance your understanding of Grafana.
Browse courses on Grafana
Show steps
  • Find a Grafana study group online or in your local community.
  • Attend study group meetings regularly.
  • Participate in discussions and ask questions.
Attend a Grafana workshop
Learn from experts and network with other Grafana users and developers.
Browse courses on Grafana
Show steps
  • Find a Grafana workshop that aligns with your interests.
  • Register for the workshop.
  • Attend the workshop and participate in the activities.
Five other activities
Expand to see all activities and additional details
Show all eight activities
Follow the Grafana documentation tutorial
Gain hands-on experience setting up and configuring Grafana.
Browse courses on Grafana
Show steps
  • Follow the 'Getting Started' tutorial.
  • Follow the 'Creating your first dashboard' tutorial.
  • Follow the 'Alerting' tutorial.
Create Grafana dashboards and alerts
Build practical skills in creating and managing Grafana dashboards and alerts.
Browse courses on Grafana
Show steps
  • Create a dashboard to visualize metrics from a sample application.
  • Create alerts to notify you of critical events.
  • Troubleshoot any issues you encounter.
Read Observability Engineering
Review the techniques and practices for designing and building effective observability systems.
Show steps
  • Read the chapters on Monitoring, Metrics, and Dashboards.
  • Read the chapter on Alerting and Monitoring.
  • Read the chapter on Grafana.
  • Read the chapter on Loki.
  • Read the chapter on Tempo.
Develop a monitoring and alerting strategy for a web application
Apply the concepts learned in the course to design and implement a comprehensive monitoring and alerting solution.
Browse courses on Observability
Show steps
  • Identify the key metrics to monitor.
  • Determine the thresholds for alerts.
  • Create Grafana dashboards to visualize the metrics.
  • Configure alerts to notify the appropriate stakeholders.
  • Test the monitoring and alerting system.
Build a custom Grafana plugin
Extend the functionality of Grafana by creating a custom plugin.
Browse courses on Grafana
Show steps
  • Identify a need for a custom plugin.
  • Design the plugin's functionality.
  • Develop the plugin's code.
  • Test the plugin.
  • Release the plugin to the Grafana community.

Career center

Learners who complete Observability with Grafana, Prometheus,Loki, Alloy and Tempo will develop knowledge and skills that may be useful to these careers:
Observability Engineer
An Observability Engineer is responsible for designing and implementing observability systems. They work to ensure that systems are monitored and alerted on, and that data is available for analysis. This course can be useful for aspiring Observability Engineers as it provides a foundation in all aspects of observability. By understanding how to collect, store, and analyze data, Observability Engineers can build systems that provide valuable insights into system performance and behavior.
DevOps Engineer
A DevOps Engineer is responsible for bridging the gap between development and operations teams. They work to ensure that software is delivered quickly and reliably. This course can be useful for aspiring DevOps Engineers as it provides a foundation in monitoring and alerting, which are key aspects of DevOps. By understanding how to monitor system performance and create alerts, DevOps Engineers can ensure that issues are identified and resolved quickly, minimizing downtime.
Performance Engineer
A Performance Engineer is responsible for optimizing the performance of software systems. They work to identify and resolve bottlenecks, and to ensure that systems meet performance requirements. This course can be useful for aspiring Performance Engineers as it provides a foundation in profiling and tracing. By understanding how to collect and analyze data about system performance, Performance Engineers can identify and resolve issues that impact performance.
Systems Engineer
A Systems Engineer is responsible for designing, building, and maintaining computer systems. They work to ensure that systems are reliable, scalable, and secure. This course can be useful for aspiring Systems Engineers as it provides a foundation in systems engineering principles. By understanding how to design, build, and test systems, Systems Engineers can build systems that meet the needs of the business.
Data Scientist
A Data Scientist is responsible for using data to solve business problems. They work to collect, analyze, and interpret data, and to develop models that can predict future outcomes. This course can be useful for aspiring Data Scientists as it provides a foundation in data science principles. By understanding how to collect, analyze, and interpret data, Data Scientists can develop models that can help businesses make better decisions.
Cloud Engineer
A Cloud Engineer is responsible for designing, building, and maintaining cloud computing systems. They work to ensure that systems are reliable, scalable, and secure. This course can be useful for aspiring Cloud Engineers as it provides a foundation in cloud computing concepts. By understanding how to design, build, and test cloud systems, Cloud Engineers can build systems that meet the needs of the business.
Data Engineer
A Data Engineer is responsible for designing, building, and maintaining data systems. They work to ensure that data is accurate, reliable, and accessible. This course can be useful for aspiring Data Engineers as it provides a foundation in metrics collection and analysis. By understanding how to collect, store, and analyze data, Data Engineers can build systems that provide valuable insights into business operations.
Software Developer
A Software Developer is responsible for designing, building, and maintaining software applications. This course can be useful for aspiring Software Developers as it provides a foundation in software engineering best practices. By understanding how to design, build, and test software, Software Developers can build high-quality applications that meet user needs.
QA Engineer
A QA Engineer is responsible for testing software and hardware products to ensure that they meet quality standards. They work to identify bugs, write test cases, and execute tests. This course can be useful for aspiring QA Engineers as it provides a foundation in software testing principles. By understanding how to test software and write test cases, QA Engineers can help ensure that products are released with high quality.
Business Analyst
A Business Analyst is responsible for understanding the needs of a business and developing solutions to meet those needs. They work to gather requirements, analyze data, and develop recommendations. This course can be useful for aspiring Business Analysts as it provides a foundation in business analysis principles. By understanding how to gather requirements, analyze data, and develop recommendations, Business Analysts can help businesses make better decisions.
Product Manager
A Product Manager is responsible for the development and launch of new products. They work to define the product vision, set product strategy, and manage the product roadmap. This course can be useful for aspiring Product Managers as it provides a foundation in product management principles. By understanding how to define the product vision, set product strategy, and manage the product roadmap, Product Managers can help businesses launch successful products.
Project Manager
A Project Manager is responsible for the planning, execution, and completion of projects. They work to define project scope, set project goals, and manage project resources. This course can be useful for aspiring Project Managers as it provides a foundation in project management principles. By understanding how to define project scope, set project goals, and manage project resources, Project Managers can help businesses complete projects successfully.
Technical Writer
A Technical Writer is responsible for creating documentation for software and hardware products. They work to explain how products work, and to provide instructions on how to use them. This course can be useful for aspiring Technical Writers as it provides a foundation in technical writing principles. By understanding how to write clear and concise documentation, Technical Writers can help users understand how to use products effectively.
Technical Support Specialist
A Technical Support Specialist is responsible for providing technical support to users. They work to troubleshoot problems, answer questions, and resolve issues. This course can be useful for aspiring Technical Support Specialists as it provides a foundation in troubleshooting and problem-solving. By understanding how to troubleshoot problems and resolve issues, Technical Support Specialists can help users get the most out of their products.
Site Reliability Engineer
A Site Reliability Engineer (SRE) is responsible for the design, building, and maintenance of software systems. They work to ensure that systems are reliable, scalable, and performant. This course can be useful for aspiring SREs as it provides a foundation in observability, which is a key aspect of maintaining reliable systems. By understanding how to collect, store, and analyze data about system performance, SREs can identify and resolve issues before they impact users.

Reading list

We've selected seven books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Observability with Grafana, Prometheus,Loki, Alloy and Tempo.
Provides a comprehensive overview of observability engineering. It covers the concepts, tools, and techniques that are essential for building and maintaining resilient and performant distributed systems. It valuable resource for anyone who wants to learn more about observability engineering.
Provides a comprehensive overview of observability engineering, covering concepts, tools, and best practices for building and operating highly reliable and observable systems. It valuable resource for anyone looking to deepen their understanding of observability and improve their ability to monitor and maintain complex systems effectively.
Provides a practical guide to site reliability engineering (SRE). It covers the principles and practices that are used to design, build, and operate reliable and scalable systems. It valuable resource for anyone who wants to learn more about SRE.
Provides a comprehensive overview of DevOps. It covers the principles and practices that are used to achieve high-performing DevOps teams. It valuable resource for anyone who wants to learn more about DevOps.
Provides a fictionalized account of a software development team that is struggling to meet its goals. It valuable resource for anyone looking to improve the culture and productivity of their software development team.
Provides a fictionalized account of a software development team that is struggling to meet its goals. It valuable resource for anyone looking to improve the culture and productivity of their software development team.
Provides a solid foundation for designing and building data-intensive applications. Learn about data models, storage systems, data processing techniques, and scalability strategies for managing and processing large volumes of data in modern applications.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Similar courses are unavailable at this time. Please try again later.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser