May 1, 2024
Updated June 18, 2025
24 minute read
An In-Depth Guide to Logging and Monitoring
Logging and Monitoring are fundamental practices in the world of information technology, acting as the eyes and ears for any system or application. At a high level, logging is the process of recording events that happen within software applications and IT infrastructure. Monitoring, on an ongoing basis, involves observing these logs and other data streams to understand the health and performance of systems in real-time. Think of it as conducting regular digital checkups to ensure everything is running smoothly.
7b1rji|
Find a path to becoming a Logging and Monitoring. Learn more at:
OpenCourser.com/topic/7b1rji/logging
Reading list
We've selected 24 books
that we think will supplement your
learning. Use these to
develop background knowledge, enrich your coursework, and gain a
deeper understanding of the topics covered in
Logging and Monitoring.
This foundational book from Google's SRE team provides a comprehensive overview of the principles and practices behind running large-scale, reliable systems. It dedicates significant sections to monitoring and alerting, explaining their critical role in maintaining system health and responding to incidents. While not solely focused on logging and monitoring, it provides essential context and highlights why these practices are integral to successful operations.
Delves into the concept of observability, which is a进化 of monitoring. It emphasizes understanding the internal state of a system from external data, with a strong focus on structured logging and tracing in addition to metrics. It's a key resource for those looking to deepen their understanding of modern system introspection.
This recent book focuses on observability in cloud-native environments, highlighting the importance of combining traces, metrics, and logs using OpenTelemetry. It teaches how to produce telemetry data and deploy the necessary backends for analysis. This is highly relevant for understanding contemporary logging and monitoring practices in modern distributed systems.
Addresses the specific challenges of logging and monitoring in microservices architectures. It discusses how the distributed nature of microservices impacts monitoring strategies and the tools and techniques needed for effective visibility in such environments. It's highly relevant for anyone working with or learning about microservices.
A practical companion to the "Site Reliability Engineering" book, this workbook offers concrete examples and exercises for implementing SRE principles. It includes chapters specifically addressing monitoring and the practical application of concepts like SLOs (Service Level Objectives), which are heavily reliant on robust monitoring and logging data. is valuable for those looking to move beyond theory into hands-on application.
A more recent perspective building on "Monitoring Distributed Systems," this book focuses specifically on observability in distributed environments. It explores how to gain deep insights into complex systems through the collection and analysis of telemetry data, including logs, metrics, and traces. It's a valuable resource for understanding the evolution of monitoring to observability.
Specifically addresses the challenges and strategies for monitoring and logging in cloud environments. It covers building a cloud-native observability strategy, which is essential in today's cloud-first world. It delves into the specifics of implementing these practices in cloud platforms.
Provides a practical guide to operating and updating telemetry systems, which encompass tracing, logging, and monitoring. It covers best practices for collecting, storing, and analyzing log data and integrating telemetry with existing infrastructure. It's a valuable resource for understanding the end-to-end process of building and maintaining telemetry systems.
Focuses on the practical implementation of Service Level Objectives (SLOs), which are heavily dependent on accurate and timely monitoring data, particularly metrics and logs. It provides a deep dive into defining and measuring SLOs, making it highly relevant for those involved in setting and tracking service reliability goals.
Offers a vendor-neutral approach to designing and implementing effective monitoring strategies. It covers essential topics such as monitoring antipatterns, principles of monitoring design, and getting metrics and logs out of applications. It practical guide suitable for operations engineers, system administrators, and SREs seeking to improve their monitoring practices regardless of the specific tools used.
Explores the principles of reliability at scale and how organizations approach it, with monitoring being a key component. It provides a systematic approach to various aspects of SRE, including tooling and monitoring. It's a good resource for understanding the broader context in which logging and monitoring operate within an SRE framework.
A classic in the field of network security monitoring, this book provides a detailed approach to using NSM for defense. It emphasizes the importance of data collection and analysis, with a strong focus on network traffic and logs as key data sources for incident detection and response. It's a valuable resource for understanding the fundamentals of security-focused monitoring.
Provides a deep dive into modern monitoring practices, covering various aspects of designing, building, and operating monitoring systems. It discusses metrics, logging, and alerting, offering practical advice and insights. It's a well-regarded resource for understanding the craft of monitoring.
An updated perspective on running a SOC, this book covers the latest trends and technologies, including automation and cloud programmability, in the context of security operations. Effective logging and monitoring are fundamental to all aspects discussed, making threlevant read for understanding contemporary security monitoring centers.
Focusing on network security monitoring (NSM), this book details the collection and analysis of network data for detecting and responding to intrusions. It covers the NSM cycle: collection, detection, and analysis, which heavily involves the use of logs and network traffic data. is particularly relevant for those interested in the security aspects of logging and monitoring.
Introduces the concept of observability and shows how to design and implement observable systems using modern technologies.
Focuses on using data analysis techniques for network security monitoring. It explains how to build monitoring systems by analyzing various data sources, including logs and network flow data. It's particularly useful for those interested in the analytical side of logging and monitoring for security purposes.
Focused on web operations, this book provides practical guidance on setting up effective monitoring and alerting systems. It covers key metrics to monitor, how to set meaningful alerts, and how to build a monitoring infrastructure. This useful resource for those specifically involved in monitoring web-based systems.
While covering broader cloud system administration topics, this book includes significant discussions on monitoring and managing distributed systems. It provides valuable context on how logging and monitoring fit into the larger picture of cloud operations and site reliability. It's a good reference for understanding the operational context of these practices.
Provides a comprehensive overview of Elasticsearch, including how to use it for log analysis.
While a broad book on DevOps principles, it emphasizes the importance of feedback loops and monitoring as critical components for achieving agility, reliability, and security. It provides the cultural and organizational context in which effective logging and monitoring thrive. Useful for understanding the broader impact of these practices.
Offers a comprehensive guide to establishing and running a Security Operations Center (SOC). It covers the technical components of a modern SOC, including data collection and analysis, which are heavily reliant on effective logging and monitoring infrastructure. It discusses various data sources and the processes for analyzing security data.
Bundle provides practical guidance on using popular network monitoring tools like Zabbix, SolarWinds, Splunk, and Cacti. While tool-specific, it offers hands-on knowledge for implementing monitoring in real-world scenarios, which is essential for anyone working directly with monitoring systems. It's useful as a reference for specific toolsets.
For more information about how these books relate to this course, visit:
OpenCourser.com/topic/7b1rji/logging