We may earn an affiliate commission when you visit our partners.
Course image
Sean Bradley

We learn the basics of Prometheus so that you can get started as soon as possible, and to follow the exercises, try them out for yourself and you can see it working.

In this course we will quickly build a bare bones Prometheus server from scratch, in the cloud and on your own Ubuntu 20.04 LTS.

Read more

We learn the basics of Prometheus so that you can get started as soon as possible, and to follow the exercises, try them out for yourself and you can see it working.

In this course we will quickly build a bare bones Prometheus server from scratch, in the cloud and on your own Ubuntu 20.04 LTS.

We will keep it simple and set it up on a default, unrestricted, un-customised Ubuntu 20.04 LTS. You will then be able to match what you see in the videos and copy/paste directly from my documentation and see the same result. Once you have the basic experience of seeing Prometheus work, you will be able to problem solve in a more directed manner, and apply your knowledge to other operating systems in the future.

At the end of the course, you will have a basic Prometheus setup, which will be in the cloud, behind a reverse proxy, with SSL, a domain name, Basic Authentication, with several custom recording rules, several alerting rules, several node exporters local and external, an alert manager that can send emails via an external SMTP service, a Grafana install, and configured with the Prometheus Data source and several dashboards.

Enroll now

What's inside

Learning objectives

  • Install prometheus and we see it working
  • Build a bare bones prometheus server from scratch, in the cloud.
  • Learn how to set it up as a service so that it is always running in the background
  • Configure it to be behind a nginx reverse proxy
  • Configure a domain name and add ssl to ensure transport layer encryption for the user interface
  • Add basic authentication to restrict user access
  • Install several node-exporters, local and external, manage there firewall rules and compare the differences
  • Learn the basics of querying metrics from simple metrics, instant vectors, range vectors, functions, aggregates and sub queries
  • Create custom metrics from complicated queries and save them as recording rules
  • Create alerting rules and demonstrate inactive, pending and firing states
  • Setup a smtp server to send email alerts
  • Configure alert manager to send alerts from prometheus
  • Install grafana
  • Setup the prometheus datasource inside grafana
  • Setup prometheus dashboards for the main prometheus service and node exporters
  • Show more
  • Show less

Syllabus

Introduction

We will setup a dedicated Prometheus server.

Before you start, you will need a Linux server. Preferably an unrestricted Ubuntu 20.04 LTS Server with root access, since all the commands demonstrated in this course were executed on Ubuntu 20.04 LTS Server.

You can use other operating systems, such as Centos, but all commands in the course are prepared for Ubuntu 20, so you will experience some differences in syntax or equivalent commands which you may need to research yourself if I can't help you.

Once you have an Ubuntu 20.04 LTS server ready, you can start.

SSH onto your server, on windows I use Putty as my SSH client.

# sudo apt install prometheus


This will have installed 2 services being Prometheus and the Prometheus Node Exporter. You can verify there status using the commands. (Press Ctrl-C to exit the status log)

# sudo service prometheus status

# sudo service prometheus-node-exporter status


The install also created a user called Prometheus. You can see which processes it is running by using the command,

# ps -u prometheus


If Prometheus has started successfully, you can visit it at

You can visit it at http://[your ip address]:9090

Read more

Note that this is optional, but it is useful if your Prometheus server is accessible from the internet, you want it to look more professional to clients and you want to have less problems sending emails from it.

I have gone onto my domain name provider, and added an A Name record that points to the IP address of my new Prometheus server.

Example,

prometheus.sbcode.net. IN A 134.209.224.39

Your domain and IP will be different, and note that it may take some time for the DNS record to propagate across the internet.

One option to help secure our Prometheus server is to put it behind a reverse proxy so that we can later add SSL and an Authentication layer over the default unrestricted Prometheus web interface.

We can use Nginx.

# sudo apt install nginx

CD to the Nginx sites-enabled folder

# cd /etc/nginx/sites-enabled

Create a new Nginx configuration for Prometheus

# sudo nano prometheus

And copy/paste the example below

-------------------------------

server {
    listen 80;
    listen [::]:80;
    server_name  YOUR-DOMAIN-NAME;

    location / {
        proxy_pass           http://localhost:9090/;
    }
}

----------------------------

Save and test the new configuration has no errors

# nginx -t

Restart Nginx

# sudo service nginx restart

# sudo service nginx status

Test it by visiting again

http://YOUR-DOMAIN-NAME

We will now add transport encryption to the Prometheus web user interface.

Since I have already set up the domain name, I can get a free certificate using Certbot.

Certbot will install a LetsEncrypt SSL certificate for free.

Ensure your domain name has propagated before running CertBot.

Your domain and IP will be different than mine, and note that it may take some time for the DNS record to propagate across the internet.

On my server, I will run

# sudo snap install --classic certbot

Now we can run CertBot.

# sudo certbot --nginx

Follow the prompts and select the domain name I want to secure.

Next open the Nginx Prometheus config file we created earlier to see the changes.

# sudo nano /etc/nginx/sites-enabled/prometheus

Everything is great so far, but anybody in the world with the internet access and the URL can visit my Prometheus server and see my data.

To solve this problem, we will add user authentication.

We will use Basic Authentication.

SSH onto your server and CD into your /etc/nginx folder.

# cd /etc/nginx

Then install apache2-utils (on ubuntu) or httpd-tools (on centos)

# //on ubuntu

# sudo apt install apache2-utils

# // on centos

# sudo yum install httpd-tools

Now we can create a password file. In the command below, I am creating a user called 'admin'.

# htpasswd -c /etc/nginx/.htpasswd admin

I then enter a password for the user.

Next open the Nginx Prometheus config file we created.

# sudo nano /etc/nginx/sites-enabled/prometheus

And add the two authentication properties in the examples below to the existing Nginx configuration file we have already created.

-------------------

server {
    ...
    #addition authentication properties
    auth_basic  "Protected Area";
    auth_basic_user_file /etc/nginx/.htpasswd;
    location / {
        proxy_pass           http://localhost:9090/;
    }
    ...
}

-------------------------------

Save and test the new configuration has no errors

# nginx -t

Restart Nginx

# sudo service nginx restart

# sudo service nginx status

When you install Prometheus using

# apt install prometheus

It sets up two metrics endpoints.

  • Prometheus : http:127.0.0.1:9090/metrics

  • Node Exporter : http:127.0.0.1:9100/metrics

In this video, I show where the settings are configured for these metrics endpoints, how to enable them, change them and show some of the properties that can be retrieved in the graph expressions field.

Now we will install an external Prometheus Node Exporter on a different server.

# apt install prometheus-node-exporter

Now check the node exporter is running.

# sudo service node-exporter status

You can stop, start or restart a node exporter using

# sudo service node-exporter stop

# sudo service node-exporter start

# sudo service node-exporter restart

Node exporter will now be running on http://[your domain or ip]:9100/metrics

You can now block port 9100 externally, but leave it open internally for localhost.

And optionally, you can also allow a specific ip address or domain on the internet to access the port.

There may be a time when you want to delete data from the Prometheus TSDB database.

Data will be automatically deleted after the storage retention time has passed. By default it is 15 days.

If you want to delete specific data earlier, then you are able.

You need to enable the admin api in Prometheus before you can.

# sudo nano /etc/default/prometheus

Add --web.enable-admin-api to the ARGS="" variable. eg,

ARGS="--web.enable-admin-api"

Restart Prometheus and check status

# sudo service prometheus restart

# sudo service prometheus status

You can now make calls to the admin api.

In my example I want to delete all time series for the instance="sbcode.net:9100"

So I run the delete_series api endpoint providing the value to match. eg,

# curl -X POST -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]={instance="sbcode.net:9100"}'

When I re execute the Prometheus query, the time series I wanted deleted no longer exists.

Now that we have at least 2 scape targets, we can begin to run some more interesting queries that involve multiple scrape targets.

We will try basic time series queries, queries with regular expressions, compare the Instant and Range Vector data types, use Functions, Aggregates and Sub Queries.

We create some Recording rules, for the more complicated common queries we may want to create time series data for.

Alerting rules are created in Prometheus very similar to how we created recording rules. We can use the same prometheus_rules.yml or, if you wish, create a different file but remember to add the reference to it in the rule_files section in prometheus.yml.

Install the Prometheus Alert Manager

# sudo apt install prometheus-alertmanager

It has started a new service called prometheus-alertmanager

# sudo service prometheus-alertmanager status

It is also managed by the user prometheus

# ps -u prometheus

Note that the service is running on port 9093

Visit http://[your domain name or ip]:9093/

We now configure the Prometheus and Alert Manager processes to communicate with each other, and to send alerts when the alerting rules fire.

I set up a new minimum spec Ubuntu 20.04 LTS server for the purpose of demonstrating install Grafana.

Once you have connected to your new server, make sure your package lists are updated.

# sudo apt update

Then ensure that the dependencies for Grafana are installed.

# sudo apt-get install -y adduser libfontconfig1

Now to download the binary, and run the debian package manager.

# wget https://dl.grafana.com/oss/release/grafana_7.2.0_amd64.deb

# sudo dpkg -i grafana_7.2.0_amd64.deb

The install has now completed. You can now start the Grafana service

# sudo service grafana-server start

Check the status

# sudo service grafana-server status

Your Grafana server will be hosted at 

http://[your Grafana server ip]:3000

The default Grafana login is

Username : **admin**

Password : **admin**

You have the option to update your password upon first login and then be presented with the option to add a new data source and create dashboards and users.

I create a new Prometheus Datasource using the Grafana user interface.

I connect to my Prometheus url, which also uses SSL and has basic auth configured.

Lets enable some of the default Dashboards provided with the Prometheus Data Source, and download one from the community specifically for the node exporters .

Thanks for taking part in my course.

We have achieved a lot in a small amount of time and you now know if you want to take Prometheus further.

We have built our dedicated Prometheus server, with multiple node exporters, with custom recording rules, alerting rules, behind a reverse proxy, with SSL and Basic Authentication. We also know how to create our own specialised queries and we know how to set up the alerting manager to at minimum send emails via SMTP. We also installed Grafana, can query using PromQL and install Prometheus dashboards.

If you decide to continue with Prometheus, then a good source of possibilities for your future direction is this Default Port Allocations page, as it show all the hundreds of exporters in development for Prometheus today.

https://github.com/prometheus/prometheus/wiki/Default-port-allocations

Traffic lights

Read about what's good
what should give you pause
and possible dealbreakers
Covers setting up Prometheus on Ubuntu 20.04 LTS, which is a common server operating system, making it directly applicable to real-world deployments
Explores reverse proxy setup with Nginx, SSL configuration with Certbot, and basic authentication, which are essential for securing Prometheus in production environments
Demonstrates configuring Alert Manager to send email alerts via SMTP, which is a practical skill for monitoring and incident response
Includes Grafana installation and Prometheus data source setup, enabling learners to create dashboards for visualizing metrics and monitoring system performance
Requires an unrestricted Ubuntu 20.04 LTS server with root access, which may necessitate setting up a dedicated virtual machine or cloud instance
Uses Grafana v7.2.0, which may not be the latest version, potentially missing out on newer features and security updates

Save this course

Create your own learning path. Save this course to your list so you can find it easily later.
Save

Reviews summary

Practical prometheus monitoring setup

According to learners, this course is a highly practical guide for setting up a basic Prometheus monitoring and alerting stack. Students particularly praise the step-by-step instructions which make it accessible even for beginners in this domain. The course provides a solid foundation by covering the installation and configuration of Prometheus, AlertManager, and Grafana, including details like reverse proxying with Nginx, SSL, and basic authentication. Many students found the content to be directly applicable to professional roles in DevOps and SRE. While excellent for getting started, learners note that it focuses on a specific environment (Ubuntu 20.04 LTS) and serves as a foundation rather than a deep dive into advanced topics, suggesting it may require further study for complex use cases.
Primarily uses Ubuntu 20.04 LTS.
"Note that the course demos are strictly on Ubuntu 20.04 LTS."
"You need to adapt the commands if you're not using Ubuntu 20.04."
"Following along requires access to an Ubuntu 20.04 server."
Focuses on getting started, not deep dives.
"Excellent for a basic setup, but doesn't cover advanced PromQL or scaling."
"It's a solid introduction, but you'll need other resources for deeper topics."
"Gets you started quickly but lacks advanced configuration details."
Suitable for those new to monitoring.
"Perfect course if you're new to monitoring with Prometheus."
"Made getting started much less intimidating."
"I had zero experience before this and now feel confident setting up basic monitoring."
Directly applicable to professional work.
"Learned skills I can immediately use in my SRE role."
"Very relevant and practical for anyone in a tech ops role."
"Directly helped me implement monitoring solutions at work."
Includes essential tools like Grafana.
"It was great that it covered Prometheus, AlertManager, and Grafana together."
"Loved seeing how to hook Prometheus up to Grafana for dashboards."
"Setting up alerts with AlertManager was covered well."
"Comprehensive coverage of the core Prometheus ecosystem tools."
Hands-on steps for core setup.
"This course is incredibly practical, showing exactly how to install and configure everything step-by-step."
"I followed the demos easily and had a working setup very quickly."
"Great course for learning how to implement Prometheus hands-on."
"The step-by-step approach to building the monitoring stack is very effective."

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Prometheus Alerting and Monitoring with these activities:
Review Linux Server Basics
Solidify your understanding of Linux server administration, as the course heavily relies on Ubuntu 20.04 LTS.
Show steps
  • Practice basic commands like navigating directories, creating files, and managing processes.
  • Review user management and permissions.
  • Familiarize yourself with package management using apt.
Review Networking Fundamentals
Strengthen your understanding of networking concepts, as the course involves configuring reverse proxies, SSL, and firewalls.
Browse courses on Networking
Show steps
  • Review basic networking concepts like IP addresses, ports, and protocols.
  • Understand the role of firewalls and how to configure them.
  • Learn about DNS records and domain name propagation.
Follow Prometheus Installation Tutorials
Reinforce your understanding of Prometheus installation by following external tutorials and comparing them to the course's approach.
Show steps
  • Find and follow at least two different online tutorials for installing Prometheus on Ubuntu.
  • Compare the steps and configurations with those presented in the course.
  • Document any differences and potential advantages or disadvantages of each approach.
Four other activities
Expand to see all activities and additional details
Show all seven activities
Practice PromQL Queries
Improve your proficiency in PromQL by practicing various queries, including those involving functions, aggregates, and subqueries.
Show steps
  • Set up a local Prometheus instance with sample data.
  • Practice writing PromQL queries to retrieve specific metrics.
  • Experiment with functions like rate, irate, and sum.
  • Create queries using aggregates like sum, avg, and max.
Document Prometheus Setup
Solidify your understanding by creating detailed documentation of your Prometheus setup, including configuration files and setup steps.
Show steps
  • Document the steps taken to install and configure Prometheus, Alertmanager, and Grafana.
  • Include screenshots and code snippets of relevant configuration files.
  • Explain the purpose of each configuration setting.
Build a Custom Dashboard
Deepen your understanding of Prometheus and Grafana by building a custom dashboard tailored to your specific monitoring needs.
Show steps
  • Identify the metrics you want to monitor.
  • Create PromQL queries to retrieve the desired data.
  • Design and build a Grafana dashboard to visualize the metrics.
  • Customize the dashboard with appropriate panels and visualizations.
Contribute to Prometheus Exporters
Enhance your expertise by contributing to open-source Prometheus exporters, gaining practical experience with real-world monitoring challenges.
Show steps
  • Identify an open-source Prometheus exporter project.
  • Review the project's documentation and code.
  • Identify a bug or feature to work on.
  • Contribute code or documentation to the project.

Career center

Learners who complete Prometheus Alerting and Monitoring will develop knowledge and skills that may be useful to these careers:
DevOps Engineer
The DevOps engineer benefits greatly from knowledge of monitoring and alerting tools. This Prometheus Alerting and Monitoring course helps the DevOps engineer by teaching the essentials of Prometheus. The course emphasizes installing Prometheus on a server, setting it up as a service, and configuring it with Nginx as a reverse proxy. This course helps build a foundation for the role of DevOps engineer. Learning to set up SSL, basic authentication, node exporters, custom recording rules, and alerting rules are all valuable for those in a DevOps position. The course also covers Alert Manager and Grafana, tools any DevOps engineer can use to manage alerts and create dashboards. Individuals seeking a job as a DevOps engineer should take this course to learn the fundamentals of Prometheus.
Site Reliability Engineer
A site reliability engineer focuses on ensuring the reliability and performance of systems. This Prometheus Alerting and Monitoring course directly aligns with the responsibilities of a site reliability engineer, who uses tools like Prometheus to monitor system health, set up alerts, and troubleshoot issues. The course emphasizes the practical setup of Prometheus, including installing it on Ubuntu, configuring reverse proxies with Nginx, and securing it with SSL and basic authentication. A site reliability engineer will find the exploration of node exporters, custom recording rules, and alerting rules especially relevant. Furthermore, the integration of Grafana for visualization will assist the site reliability engineer in creating dashboards to offer insight into system performance. Aspiring site reliability engineers should enroll in this course to build a solid foundation in Prometheus.
Systems Administrator
Systems administrators are responsible for maintaining computer systems. The Prometheus Alerting and Monitoring course is directly applicable to a systems administrator's role. The course content focuses on setting up a Prometheus server from scratch on Ubuntu, configuring it as a service, and securing it with Nginx, SSL, and basic authentication. A systems administrator can use this information to set up a monitoring system. The knowledge of node exporters, custom recording rules, and alerting rules taught in the course are helpful for monitoring server health. The course also describes how to install Grafana and configure dashboards, which are useful for monitoring and management. Therefore, anyone looking to become a systems administrator should consider taking this course.
Cloud Engineer
Cloud engineers are focused on building and maintaining cloud infrastructure. This Prometheus Alerting and Monitoring course is useful for cloud engineers because it teaches how to set up Prometheus servers in the cloud. The course goes over the basics of installing Prometheus on Ubuntu, configuring reverse proxies with Nginx, and securing the servers with SSL and basic authentication. A cloud engineer can use this information to build monitoring for cloud resources. The course also discusses the use of node exporters and how to create custom recording rules and alerting rules. A cloud engineer will also find value in the lessons involving Alert Manager and Grafana, since they can use these tools to get a better visualization of their infrastructure. Anyone looking to become a cloud engineer should consider taking this course.
Network Engineer
Network engineers are responsible for managing and maintaining network infrastructure. This Prometheus Alerting and Monitoring course is relevant to network engineers because it teaches how to set up Prometheus to monitor network devices. The course reviews the basics of Prometheus installation on Ubuntu, setting up reverse proxies with Nginx, and securing it with SSL and basic authentication. A network engineer can use this information to monitor network health. The course's coverage of node exporters and custom recording rules will be very helpful. The integration of Alert Manager and Grafana will allow a network engineer to monitor alerts and create visualizations of the network. Consider taking this course if a career as network engineer is your goal.
Database Administrator
Database administrators are in charge of managing and maintaining databases. The Prometheus Alerting and Monitoring course may be useful for a database administrator, as it introduces tools for monitoring database performance. The course reviews how to install Prometheus on Ubuntu, configure reverse proxies with Nginx, and secure it with SSL and basic authentication. A database administrator can use this knowledge to monitor database servers. The course covers alert manager, which helps make the database administrator aware of issues. Additionally, knowledge of Grafana can provide a visual understanding of database health. If you're interested in database administration, this course provides some background.
IT Support Specialist
An IT support specialist helps resolve technical issues for end-users. The Prometheus Alerting and Monitoring course may be useful for an IT support specialist, as it provides foundational knowledge of system monitoring. The course covers installing Prometheus on Ubuntu, configuring reverse proxies, and setting up SSL and basic authentication. While IT support specialists may not directly implement these configurations, understanding how systems are monitored can aid in troubleshooting. The course's lessons on node exporters and alerting rules may also be valuable in understanding system behavior. While not a core skill, knowledge gained in this course can help an IT support specialist provide better assistance to end-users. Learning from this course may improve your career as an IT support specialist.
Security Analyst
Security analysts protect computer systems and networks from threats. Here, the Prometheus Alerting and Monitoring course may be valuable to a security analyst as it covers aspects of system monitoring and security configurations. While the primary focus of Prometheus is not security, the course touches on setting up reverse proxies with Nginx, enabling SSL, and implementing basic authentication. A security analyst can use this information to better understand how systems are secured and monitored. The course's coverage of node exporters and alerting rules may also be relevant to security monitoring. A security analyst may find this foundational background useful.
Software Developer
Software developers write and maintain code for applications. This Prometheus Alerting and Monitoring course may be useful for software developers because it provides insights into how applications are monitored in production environments. The Prometheus Alerting and Monitoring course reviews the basics of installing Prometheus on Ubuntu, configuring reverse proxies with Nginx, and securing it with SSL and basic authentication. A software developer can use this information to understand how their applications are monitored. The course's lessons on node exporters and alerting rules may also be valuable in understanding how to make software that is more easily monitored. Understanding these concepts can improve software development skills.
Data Scientist
Data scientists analyze data to extract meaningful insights. The Prometheus Alerting and Monitoring course is not directly related to data science, as it focuses on system monitoring rather than data analysis. While a data scientist may occasionally interact with system metrics, the core skills taught in the course are not typically used in data science. The course covers setting up Prometheus on Ubuntu, configuring reverse proxies, and setting up SSL and basic authentication, which are not central to data science tasks. Data scientists should consider other courses that focus on data analysis techniques.
Technical Writer
Technical writers create documentation for software and hardware. The Prometheus Alerting and Monitoring course is not a direct path into technical writing, as it focuses on setting up monitoring solutions rather than writing documentation. The skills learned in the course, such as installing Prometheus and configuring reverse proxies, are not directly applicable to technical writing. A technical writer typically will not work with Prometheus. However, the course may be relevant for technical writers who document monitoring systems. Technical writers should generally focus on improving their writing and communication abilities.
Sales Engineer
Sales engineers sell complex technical products or services. This Prometheus Alerting and Monitoring course does not directly align with the responsibilities of a sales engineer. The skills and knowledge taught in the course, such as installing Prometheus and configuring reverse proxies, are not the primary focus of a sales engineer. Sales engineers need to develop strong communication, presentation, and sales techniques instead. Sales engineers should consider other courses that focus on those skills.
Project Manager
Project managers plan, execute, and close projects. The Prometheus Alerting and Monitoring course is not directly applicable to project management, as it focuses on setting up monitoring systems. A project manager may benefit from a general understanding of technical concepts, but the specific skills taught in this course are not essential for project management. The Prometheus Alerting and Monitoring course includes installing Prometheus and configuring reverse proxies. Project managers should consider courses focused on project management methodologies and tools.
Business Analyst
Business analysts identify business needs and recommend solutions. This Prometheus Alerting and Monitoring course is not relevant to business analysis, as it focuses on setting up monitoring systems. The skills taught in this course, such as installing Prometheus and configuring reverse proxies, are not used by business analysts. Business analysts should consider courses focused on business process analysis and requirements gathering to improve their skills.
Human Resources Manager
Human resources managers oversee employee-related functions. This Prometheus Alerting and Monitoring course is not applicable to human resources management, as it focuses on setting up monitoring systems. The skills taught in this course, such as installing Prometheus and configuring reverse proxies, are not relevant to human resources tasks. A human resources manager should consider other courses that focus on employment law, employee relations, or talent management instead. This course has no relevant connection to human resources management.

Reading list

We haven't picked any books for this reading list yet.
Practical guide to using Prometheus. It covers topics such as installation, configuration, monitoring, and alerting. It also includes recipes for common Prometheus use cases.
Provides practical advice and best practices for system and network administration, including a chapter on monitoring and alerting. It covers topics such as alert design, monitoring tools, and escalation procedures.
Save
This collection of essays offers diverse perspectives on implementing SRE principles in various settings. It explores how SRE relates to DevOps and discusses cutting-edge specialties in the field, including aspects of monitoring and reliability. is valuable for gaining a broader understanding of how SRE and monitoring are practiced in different organizations.
Provides a practical guide to using Prometheus, a popular open-source monitoring and alerting system. It covers topics such as installing and configuring Prometheus, writing PromQL queries, and creating alerts.
Provides a comprehensive guide to observability engineering, a set of practices and tools that enable engineers to monitor, troubleshoot, and debug complex systems. It includes a chapter on alerting, providing guidance on how to design and implement effective alerting systems.
Provides a practical guide to implementing service level objectives (SLOs), which are used to define and measure the performance of software systems. It includes a chapter on alerting and monitoring, providing guidance on how to set up SLOs and create alerts that measure progress towards meeting them.
Is considered a foundational text in Site Reliability Engineering (SRE), a discipline heavily intertwined with modern monitoring practices. It provides a comprehensive overview of how Google approaches reliability, including their philosophies on monitoring, alerting, and incident response. It is highly valuable for establishing a strong understanding of the principles behind effective monitoring in large-scale systems and is commonly used as a reference by industry professionals.
Provides a deep dive into modern monitoring practices, covering a wide range of tools and techniques. It is an opinionated book that offers valuable insights into implementing effective monitoring solutions. While it discusses specific tools, it also contains plenty of theory that is essential for a solid understanding of the topic.
Provides a comprehensive overview of Prometheus, an open-source monitoring system. It covers topics such as installing and configuring Prometheus, creating alerts, and using Prometheus to monitor different types of systems.
Provides a comprehensive overview of Jaeger, an open-source distributed tracing system. It covers topics such as installing and configuring Jaeger, creating traces, and using Jaeger to monitor different types of systems.
Provides a comprehensive overview of performance engineering. It covers topics such as performance metrics, data collection and analysis, and performance modeling.
As a companion to 'Site Reliability Engineering,' this workbook offers practical examples and case studies from Google and other companies on implementing SRE principles. It provides hands-on guidance for applying the concepts discussed in the first book, including practical approaches to monitoring and incident management. is excellent for deepening the understanding gained from the foundational SRE text and useful reference for implementing SRE practices.
Offers a pragmatic and tool-agnostic approach to monitoring. It covers essential topics such as monitoring antipatterns, principles of monitoring design, and getting metrics and logs from applications. It is highly relevant for gaining a broad understanding of effective monitoring strategies and valuable resource for anyone looking to improve their monitoring practices regardless of the specific tools they use.
Focuses on the modern concept of observability in software engineering, which goes beyond traditional monitoring. It provides a practical guide to building and managing highly observable systems, covering logging, metrics, and tracing. This key book for understanding contemporary topics in monitoring and is highly relevant for those working with complex distributed systems.
Provides a comprehensive guide to using Nagios, a popular open-source monitoring and alerting tool. It covers topics such as configuring Nagios, writing custom plugins, and setting up notifications.
Focuses on the practical aspects of monitoring and alerting for web operations. It delves into the technical details of configuring and maintaining monitors and alerts. It useful resource for those looking to deepen their understanding of implementing effective alerting strategies.
This book, an excerpt from the 'Site Reliability Engineering' book, focuses specifically on monitoring distributed systems. It explains basic principles and best practices for building successful monitoring and alerting systems in complex environments. It is highly relevant for those working with distributed systems and provides implementation-agnostic guidance.
While not solely focused on monitoring, this influential book on DevOps covers the importance of feedback loops and the role of monitoring in achieving reliability and agility. It provides a broader context for understanding where monitoring fits within a successful technology organization. It valuable resource for understanding the cultural and organizational aspects that support effective monitoring.
Applies SRE principles to database systems, including specific guidance on monitoring database performance and reliability. It specialized book that provides in-depth knowledge for those focused on database monitoring. It useful resource for deepening understanding in a critical area of infrastructure.
While not specifically focused on alerting, this book provides a comprehensive guide to site reliability engineering (SRE) practices, including chapters on monitoring, alerting, and incident response. It is valuable for anyone involved in designing and operating reliable systems.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Similar courses are unavailable at this time. Please try again later.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser