We may earn an affiliate commission when you visit our partners.
Course image
Andrew Howden

In this course, you'll learn the fundamental building blog of making systems reliable: making them observable.

Read more

In this course, you'll learn the fundamental building blog of making systems reliable: making them observable.

We'll talk about Observability, why it is such an essential part of making reliable software, how to understand whether or not a system is "observable", and then how to make it observable by instrumenting it with different "pillars" of observability. We'll discuss two of those pillars — logs & traces — and we'll talk about what problem each of these solves.

To help enable you to make this topic practical, we'll go through examples in Go instrumenting sample applications that you can reproduce on your own Linux (or WSL-based) system. We'll examine the output of this instrumentation in the terminal or open-source UIs that you can use to learn the concepts. Lastly, we'll reproduce some failure modes to understand what failure looks like in these tools and give you a wider range of capabilities to debug different production issues.

This course was made for you if you are a mid to senior-level developer with some experience deploying software to production. Still, you’re looking to build the skills and capabilities to run higher-scale services with more traffic and debug these systems when they get into trouble.

Let's get started.

Enroll now

What's inside

Learning objectives

  • Understand what "observability" is, and how a software system can be "observable"
  • Understand what the major data types (or "pillars") of observability are, what the trade-offs are and when to use each
  • Be able to instrument a "go" based example application generating logs & traces
  • Be able to use the telemetric data generated to debug different production problems.

Syllabus

Students will be able to understand what is in this course, how they course is structured and they'll need to get the most value out of the material.
Read more

You'll meet me (Andrew Howden), your instructor, in this lecture. I'll briefly review my background and credentials so you understand the person trying to pass on their knowledge to you.

You'll learn what the upcoming course material will be the different types of lessons and things you should do while doing the course.

You'll learn what you are expected to have as you go through the course.

You'll learn where to go to ask questions about the material or meet fellow students.

Students should understand what "Observability" is, and how a software system can be "Observable"

Here, we'll talk about the fundamental problem that Observability sets out to solve. We'll also introduce our "stable example" — a service that handles delivery options.

You'll learn where the term "Observability" came from, what it means in terms of software and why it has only recently been a feature of software development.

You'll learn about the different things that we want to make "Observable", as well as the different layers of what we think of as the "software stack".

NodeJS looking like it is creating and writing sockets, but actually, is not.

You'll learn about the different kids of problem that tends to happen as you're writing software, as well as why it is important o know the different kids of problem to make software "observable".

You'll learn more about how to be sure the work that you're doing the system "Observable" truly does help you, and colleagues in future.

You'll learn what the standard primitives of observability are (the so-called "Pillars of observability").

Quiz Intro: Check your understanding of observability theory
Observability Theory
Students will be able to understand what a "log" is, what the tradeoffs to using it are, and how to add a log to an application.

You'll learn where "logs" came from

In this lecture, we'll learn what "a log" is

In this lecture, we'll learn the importance of adding an initial bit of context to our logs: time.

You'll learn how to make your log messages meaningful, and what information is useful.

You'll learn some caveats peculiar to logging based on where logs are written to.

You'll learn how to write your log messages in a way thats useful both for people and for software.

You'll learn which observability problems are uniquely suited to logs, and which should be better left for other pillars.

You'll learn how to implement logs in the reference application (the "delivery options" service)

You'll learn how to consume logs in some basic ways, without digging too deep into vendor tooling.

You'll learn a common way of handling logs in the Linux operating system, as well as how you can go and explore it yourself.

Quiz

In this quiz, we'll recap some of the content we went through to validate our understanding of logs.

As a result of completing this unit, you should have a good understanding of what distributed tracing is, where it came from, its primitives, when to use it and some things to be careful of.

In this lecture, you'll learn what to expect in this section.

In this unit, you'll learn more about the core problem that necessitates DIstributed Tracing.

In this lecture, you'll learn about where distributed tracing came from.

In this lecture, you'll learn why I rank tracing as the most important observability pillar.

Theory

In this lecture, we'll set up the user interface we'll use as a reference implementation of Tracing: Jaeger.

In this lecture, we'll briefly examine the data pipeline that allows us to deliver our diagnostics to their eventual destination, in this case, Jaeger.

In this lecture, we'll learn more about the functional unit of distributed tracing: "spans".

In this lecture, we'll go through how to create a "distributed trace" from a set of "spans".

In this lecture, we'll learn how to indicate whether or not our "operations" succeeded or failed, as well as what type of operation they are.

In this lecture, we'll learn how to enrich our spans with metadata to understand the context of what is happening in the request.

In this lecture, we'll go through how to make the instrumentation substantially more efficient, as well as review what can be a dark side of automated instrumentation.

In this lecture, we'll take a quick break from the theory of distributed tracing and talk about one example of where distributed tracing has been very useful.

In this lecture, we'll learn how to model state transitions or other events within distributed traces, even if they do not have the required "start" and "end" properties.

In this lecture, we'll learn how to propagate context across different microservices such that we can create a single, large diagnostic record that spans the breadth of our microservice architecture.

In this lecture, we'll learn how to read the broader system state supplied by client services within our service and export that so its easier to reason through the state of our service within that broad system.

This lecture will introduce the upcoming quiz

Review your knowledge of distributed tracing

At the end of this section, learners should understand and be able to implement a numeric view of their applications internal state, and use that to debug specific problems.

In this lecture, you'll learn that while there are problems that you've been able to solve so far, there are still problems that remain unsolvable with the material covered so far. We'll also cover where you can see examples of metrics in your regular life

In this lecture, we'll learn a little more about the problem that metrics exists to solve, as well as how metrics are common parts of other telemetry.

In this lecture, you'll learn how the thinking about metrics has evolved over the past 20 years or so.

Metrics on Linux

At the end of this unit, you'll be able to install Prometheus and the node exporter on a Debian-based machine and then view the CPU usage. You'll also know how to check for similar metrics and where to learn more about them.

At the end of this lecture, you can install Grafana and use it to access Prometheus running on the same host. You'll be able to create your own, sample graph as well as import a third-party graph from Grafana Online.

At the end of this lecture, you'll be able to know the different types of "metrics" that we use to solve different classes of problems.

At the end of this lecture, you'll know how the aggregation and filtering works on modern metrics systems.

After watching this lecture, you'll understand the common wire formats that you will find metrics in.

You'll learn how to configure Prometheus receive to OpenTelemetry.

You'll learn how to connect your application with a time series data store, like Prometheus.

Simple Counter

In this lecture, you'll learn how to instrument the "garbage collector" in Go.

In this lecture, you'll learn how to export the state of your system via the proc filesystem, and use that in practice to expose a relatively new metric called "pressure stall information".

Things that are not to do with the course content, but rather meta for the course itself.
Thanks & Acknowledgements

In this lecture, I'll tell you more about where you can go to learn about the planned future improvements to this course and how you can contribute ideas or suggestions to improve this course further.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Helps learners build a strong foundation in Observability, which is crucial for building reliable applications
Specifically intended for mid to senior-level developers with experience deploying software to production
Suitable for those looking to develop the skills and knowledge to handle higher-scale services with more traffic
Taught by Andrew Howden, who has extensive experience in Observability and building reliable systems
Utilizes in-demand technologies and tools like Go and traces for hands-on practice
Focuses on key Observability pillars such as logs and metrics, making it practical and applicable to real-world scenarios

Save this course

Save Practical introduction to Observability to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Practical introduction to Observability with these activities:
Review Logging Concepts
Revisit core concepts of logging to reinforce understanding. Identify the different types of logs, the benefits of using logs, best practices for logging, and common logging frameworks.
Browse courses on Logs
Show steps
  • Read articles or documentation on logging concepts.
  • Review your existing knowledge of logging.
  • Take a quiz or practice exercise on logging.
Hands-on Observability Workshop
Find a workshop that provides hands-on experience with observability tools and techniques. Practice collecting, analyzing, and visualizing data. Participate in discussions and interact with experts to expand your knowledge.
Browse courses on Observability
Show steps
  • Find a workshop that covers observability topics.
  • Attend the workshop and actively participate in hands-on activities.
  • Ask questions and engage with experts in the field.
  • Apply the knowledge gained to your own projects.
  • Share your experiences with others.
Trace Visualizer Practice
Load a distributed trace into a visualization tool. Identify the different parts of the trace. Describe the flow of the trace. Discuss ways in which the trace could be used to debug a system.
Browse courses on Traces
Show steps
  • Install a trace visualization tool such as Jaeger.
  • Load a trace into the visualization tool.
  • Identify the different parts of the trace, such as spans, annotations, and events.
  • Describe the flow of the trace from start to finish.
  • Discuss ways in which the trace could be used to debug a system, such as identifying performance bottlenecks or errors.
Four other activities
Expand to see all activities and additional details
Show all seven activities
Log Patterns in Practice
Go through a series of logs that simulate authentic system activity. Determine what type of problem is being reported, or if there is no problem, explain why it is not. Discuss the pattern matching strategies you used to deduce the answer.
Browse courses on Logs
Show steps
  • Review common log formats.
  • Gather a set of logs from a running system or log repository.
  • Analyze the logs for errors or indications of abnormal behavior.
  • Classify the logs according to the type of problem they indicate.
  • Write a brief report summarizing your findings.
Instrumentation Developer Guide
You will produce an easy-to-follow guide that explains how to instrument a Go application using common logging libraries. Include best practices and examples to help developers understand the benefits and proper implementation of logging.
Browse courses on Logs
Show steps
  • Research and gather information on Go logging libraries and best practices.
  • Develop a plan for the structure and content of the guide.
  • Write the guide using clear and concise language.
  • Include code examples and snippets for clarity.
  • Proofread and edit the guide for accuracy and completeness.
Log Shipper to Remote System
Make a meaningful contribution to an open-source log shipper project. Identify an area where you can add functionality or improve the user experience. Work with the project maintainers to get your changes merged.
Browse courses on Logs
Show steps
  • Identify an open-source log shipper project that you would like to contribute to.
  • Install the project and become familiar with its codebase.
  • Identify an area where you can add functionality or improve the user experience.
  • Fork the project and make your changes in a branch.
  • Submit a pull request to the project maintainers.
Metrics Dashboard
Create a dashboard using a tool like Grafana to visualize key metrics for a system. Configure alerts to notify you when certain thresholds are reached. Explain the purpose of each metric and how it can be used to monitor the health of the system.
Browse courses on Metrics
Show steps
  • Install a dashboarding tool such as Grafana.
  • Identify the key metrics that you want to visualize.
  • Configure the dashboard to display the metrics in a clear and concise way.
  • Set up alerts to notify you when certain thresholds are reached.
  • Write a brief report explaining the purpose of each metric and how it can be used to monitor the health of the system.

Career center

Learners who complete Practical introduction to Observability will develop knowledge and skills that may be useful to these careers:
Observability Engineer
The Observability Engineer is responsible for designing, implementing, and maintaining systems that monitor performance and reliability. They work with software engineers to instrument applications and collect data on their behavior. This data is then used to create dashboards and alerts that help to identify and resolve problems. The Observability Engineer also works with DevOps teams to automate the monitoring process and to ensure that systems are always up and running. This course provides a strong foundation in the principles of observability, and it covers the key technologies that are used to implement observability systems. It is an ideal course for anyone who wants to learn more about this rapidly growing field.
Site Reliability Engineer
The Site Reliability Engineer (SRE) is responsible for the reliability and performance of a website or application. They work with software engineers to design and implement systems that are scalable, fault-tolerant, and secure. They also work with operations teams to ensure that systems are always up and running. This course provides a strong foundation in the principles of site reliability engineering, and it covers the key technologies that are used to implement SRE systems. It is an ideal course for anyone who wants to learn more about this critical role.
Performance Engineer
The Performance Engineer is responsible for optimizing the performance of applications and systems. They work with software engineers to identify bottlenecks and to implement solutions that improve performance. They also work with operations teams to ensure that systems are always running at peak efficiency. This course provides a strong foundation in the principles of performance engineering, and it covers the key technologies that are used to implement performance engineering solutions. It is an ideal course for anyone who wants to learn more about this important field.
Data Engineer
The Data Engineer is responsible for designing, building, and maintaining data systems. They work with data scientists to understand the data requirements of the business and to develop systems that can meet those requirements. They also work with operations teams to ensure that data systems are always up and running. This course provides a strong foundation in the principles of data engineering, and it covers the key technologies that are used to implement data engineering systems. It is an ideal course for anyone who wants to learn more about this rapidly growing field.
DevOps Engineer
The DevOps Engineer is responsible for bridging the gap between development and operations teams. They work with both teams to ensure that applications are deployed and maintained efficiently and reliably. They also work with site reliability engineers to ensure that systems are always up and running. This course provides a strong foundation in the principles of DevOps, and it covers the key technologies that are used to implement DevOps practices. It is an ideal course for anyone who wants to learn more about this important role.
Cloud Architect
The Cloud Architect is responsible for designing and implementing cloud computing solutions. They work with business stakeholders to understand their needs and to develop solutions that meet those needs. They also work with development and operations teams to ensure that cloud solutions are deployed and maintained efficiently and reliably. This course provides a strong foundation in the principles of cloud computing, and it covers the key technologies that are used to implement cloud solutions. It is an ideal course for anyone who wants to learn more about this rapidly growing field.
Software Engineer
The Software Engineer is responsible for designing, developing, and maintaining software applications. They work with business stakeholders to understand their needs and to develop solutions that meet those needs. They also work with operations teams to ensure that software applications are deployed and maintained efficiently and reliably. This course provides a strong foundation in the principles of software engineering, and it covers the key technologies that are used to develop software applications. It is an ideal course for anyone who wants to learn more about this critical role.
Data Scientist
The Data Scientist is responsible for using data to solve business problems. They work with business stakeholders to understand their needs and to develop solutions that meet those needs. They also work with data engineers to ensure that data is available and accessible. This course provides a strong foundation in the principles of data science, and it covers the key technologies that are used to implement data science solutions. It is an ideal course for anyone who wants to learn more about this rapidly growing field.
Business Analyst
The Business Analyst is responsible for understanding the business needs of an organization and developing solutions that meet those needs. They work with stakeholders to gather requirements and to develop solutions that are both effective and efficient. This course provides a strong foundation in the principles of business analysis, and it covers the key technologies that are used to implement business analysis solutions. It is an ideal course for anyone who wants to learn more about this important role.
Product Manager
The Product Manager is responsible for the development and management of a product. They work with stakeholders to understand their needs and to develop a product that meets those needs. They also work with engineering and marketing teams to ensure that the product is developed and marketed effectively. This course provides a strong foundation in the principles of product management, and it covers the key technologies that are used to implement product management solutions. It is an ideal course for anyone who wants to learn more about this important role.
Project Manager
The Project Manager is responsible for planning, executing, and closing a project. They work with stakeholders to define the project scope and to develop a project plan. They also work with the project team to ensure that the project is completed on time and within budget. This course provides a strong foundation in the principles of project management, and it covers the key technologies that are used to implement project management solutions. It is an ideal course for anyone who wants to learn more about this important role.
Quality Assurance Analyst
The Quality Assurance Analyst is responsible for testing software applications to ensure that they are free of defects. They work with development teams to identify and fix defects. They also work with stakeholders to ensure that the software meets their requirements. This course provides a strong foundation in the principles of quality assurance, and it covers the key technologies that are used to implement quality assurance solutions. It is an ideal course for anyone who wants to learn more about this important role.
Technical Writer
The Technical Writer is responsible for creating and maintaining documentation for software applications. They work with development teams to understand the technical details of the software and to create documentation that is clear and concise. They also work with marketing teams to create marketing materials that explain the benefits of the software. This course provides a strong foundation in the principles of technical writing, and it covers the key technologies that are used to implement technical writing solutions. It is an ideal course for anyone who wants to learn more about this important role.
User Experience Designer
The User Experience Designer is responsible for designing the user interface of software applications. They work with development teams to ensure that the user interface is easy to use and visually appealing. They also work with marketing teams to create marketing materials that explain the benefits of the software. This course provides a strong foundation in the principles of user experience design, and it covers the key technologies that are used to implement user experience design solutions. It is an ideal course for anyone who wants to learn more about this important role.
Marketing Manager
The Marketing Manager is responsible for developing and executing marketing campaigns. They work with sales teams to generate leads and to close deals. They also work with product teams to develop marketing materials that explain the benefits of the software. This course provides a strong foundation in the principles of marketing management, and it covers the key technologies that are used to implement marketing management solutions. It is an ideal course for anyone who wants to learn more about this important role.

Reading list

We've selected seven books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Practical introduction to Observability.
Will be a good supplementary text for learners in this course as it provides context and information on how observability fits into building reliable systems
Will be good supplementary reading for learners to round out their knowledge as it focuses on building systems out of many small services, and observability is very important for that type of architecture
Is useful for specific guidance on using Prometheus, including advanced features like rule writing, alerting, and dashboards
Provides a great theoretical background in designing systems to operate at a large scale, which kind of context that will be helpful in understanding observability
Provides a helpful theoretical background in distributed system concepts, which will help learners understand the purpose of observability and related concepts
Provides a practical introduction to building Go applications and microservices, which can be supplemented by the more specific observability topics covered in the course

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Practical introduction to Observability.
Observability with OpenTelemetry and Grafana
Most relevant
Observability in Cloud Native apps using OpenTelemetry
Most relevant
Build Reliable Systems with SQL Server
Certified Kubernetes Application Developer: Application...
Node.js Microservices: Monitoring and Logging
RxJS 7 and Observables: Introduction
New Relic One: Observability From Beginner to Advanced
Incorporating Site Reliability Engineering (SRE) in Your...
Deployment of Machine Learning Models
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser