We may earn an affiliate commission when you visit our partners.

Infrastructure Monitoring

Save
May 11, 2024 Updated July 19, 2025 14 minute read

Infrastructure monitoring is the practice of continually observing and recording the state and performance of an infrastructure system, such as a computer network, server, or application. This information can be used to ensure that the system is functioning properly, to identify and resolve problems quickly, and to plan for future capacity needs.

Benefits of infrastructure monitoring

There are many benefits to implementing an infrastructure monitoring system, including:

Share

Help others find this page about Infrastructure Monitoring: by sharing it with your friends and followers:

Reading list

We've selected 27 books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Infrastructure Monitoring.
Dives deep into the concept of observability, which modern evolution of monitoring. It explains how to build observable systems and practice observability-driven development, crucial for understanding complex cloud-native applications. It's highly relevant to contemporary infrastructure monitoring practices and provides practical guidance for migrating from traditional monitoring tools.
This foundational book introduces the principles and practices of Site Reliability Engineering (SRE) at Google, which heavily emphasizes monitoring and alerting as core components. It provides a high-level overview of how large-scale systems are managed for reliability. It's a valuable resource for understanding the 'why' behind many infrastructure monitoring practices and is often referenced in the industry.
A practical companion to the "Site Reliability Engineering" book, this workbook provides hands-on examples and case studies for implementing SRE principles, including practical applications of monitoring and creating Service Level Objectives (SLOs). It helps solidify the theoretical concepts with real-world scenarios and is useful for those looking to apply SRE practices.
A deep dive into systems performance analysis and tuning, this book covers methodologies, tools, and metrics for optimizing system performance. It provides essential knowledge for understanding what to monitor and how to interpret performance data, which is fundamental to effective infrastructure monitoring.
Practical guide to OpenTelemetry, covering its setup and operation for building a modern observability system. It's highly relevant to contemporary infrastructure monitoring practices, especially in cloud-native and distributed environments.
While a novel, this book provides a highly insightful look into the challenges of IT operations and the principles of DevOps, including the importance of monitoring and feedback loops. It's an excellent introductory read for understanding the broader context in which infrastructure monitoring operates and is considered a classic in the DevOps space.
This comprehensive guide to DevOps principles and practices includes significant coverage of monitoring and feedback loops as essential components of a high-performing technology organization. It provides a broader context for infrastructure monitoring within the DevOps framework.
Offers practical strategies for monitoring production systems effectively. It covers various aspects of monitoring, including what to monitor, how to set up alerts, and how to use monitoring data to improve system reliability. It's a valuable resource for practitioners looking for actionable advice on building robust monitoring systems.
Introduces Chaos Engineering, a discipline focused on experimenting on a system in order to build confidence in that system's capability to withstand turbulent conditions in production. It highlights the importance of monitoring and observability in understanding the impact of failures and building resilient systems, a contemporary topic in infrastructure management.
This volume delves into the practices of cloud system administration, incorporating DevOps and SRE principles. It covers architecting, scaling, and operating reliable services in the cloud, with significant emphasis on monitoring as a key practice. It's particularly useful for understanding monitoring within cloud environments mentioned in some courses.
Provides a comprehensive overview of software architecture for big data systems, covering topics such as data storage, processing, and analysis.
Takes a broader look at the art and science of monitoring, moving beyond just tools to discuss the philosophy and strategy behind effective monitoring. It's a valuable read for gaining a deeper understanding of monitoring principles applicable across various technologies.
Provides a comprehensive guide to using Kubernetes, a popular open-source platform for managing containerized applications.
Focusing on a popular open-source monitoring system, this book provides a hands-on introduction to Prometheus. It covers key aspects like dashboarding, alerting, and metric collection. While specific to Prometheus, it offers practical insights into using a modern monitoring tool, relevant to courses mentioning Prometheus.
Provides a practical guide to using Docker, a popular open-source platform for building and managing containerized applications.
Provides a comprehensive guide to designing and building microservices, a popular architectural style for building distributed systems.
A follow-up to 'The Phoenix Project,' this novel explores the developer's perspective and the importance of architecture, telemetry, and experimentation. It reinforces the value of building observable systems and provides a relatable narrative around improving technology organizations.
Provides a practical guide to implementing continuous delivery practices, which enable teams to deliver software updates quickly and reliably.
Focuses on building production-ready microservices, including the importance of monitoring and standardization in a microservices architecture. It's relevant for understanding monitoring challenges and strategies in distributed systems.
Provides a comprehensive explanation of network monitoring, covering design principles and deployment strategies for enterprise environments. While focused on networking, it's highly relevant as network health crucial part of overall infrastructure monitoring. It's a good resource for understanding the specifics of network visibility.
Provides a comprehensive overview of cloud computing concepts, technology, and architecture. While not solely focused on monitoring, it lays the groundwork for understanding the environments where much of modern infrastructure monitoring takes place. It's useful for gaining a broader understanding of cloud infrastructure.
Provides a comprehensive guide to cloud management, covering topics such as cloud architecture, security, cost management, and monitoring.
Table of Contents
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser