In this course, you'll learn the fundamental building blog of making systems reliable: making them observable.
We'll talk about Observability, why it is such an essential part of making reliable software, how to understand whether or not a system is "observable", and then how to make it observable by instrumenting it with different "pillars" of observability. We'll discuss two of those pillars — logs & traces — and we'll talk about what problem each of these solves.
In this course, you'll learn the fundamental building blog of making systems reliable: making them observable.
We'll talk about Observability, why it is such an essential part of making reliable software, how to understand whether or not a system is "observable", and then how to make it observable by instrumenting it with different "pillars" of observability. We'll discuss two of those pillars — logs & traces — and we'll talk about what problem each of these solves.
To help enable you to make this topic practical, we'll go through examples in Go instrumenting sample applications that you can reproduce on your own Linux (or WSL-based) system. We'll examine the output of this instrumentation in the terminal or open-source UIs that you can use to learn the concepts. Lastly, we'll reproduce some failure modes to understand what failure looks like in these tools and give you a wider range of capabilities to debug different production issues.
This course was made for you if you are a mid to senior-level developer with some experience deploying software to production. Still, you’re looking to build the skills and capabilities to run higher-scale services with more traffic and debug these systems when they get into trouble.
Let's get started.
You'll meet me (Andrew Howden), your instructor, in this lecture. I'll briefly review my background and credentials so you understand the person trying to pass on their knowledge to you.
You'll learn what the upcoming course material will be the different types of lessons and things you should do while doing the course.
You'll learn what you are expected to have as you go through the course.
You'll learn where to go to ask questions about the material or meet fellow students.
Here, we'll talk about the fundamental problem that Observability sets out to solve. We'll also introduce our "stable example" — a service that handles delivery options.
You'll learn where the term "Observability" came from, what it means in terms of software and why it has only recently been a feature of software development.
You'll learn about the different things that we want to make "Observable", as well as the different layers of what we think of as the "software stack".
NodeJS looking like it is creating and writing sockets, but actually, is not.
You'll learn about the different kids of problem that tends to happen as you're writing software, as well as why it is important o know the different kids of problem to make software "observable".
You'll learn more about how to be sure the work that you're doing the system "Observable" truly does help you, and colleagues in future.
You'll learn what the standard primitives of observability are (the so-called "Pillars of observability").
You'll learn where "logs" came from
In this lecture, we'll learn what "a log" is
In this lecture, we'll learn the importance of adding an initial bit of context to our logs: time.
You'll learn how to make your log messages meaningful, and what information is useful.
You'll learn some caveats peculiar to logging based on where logs are written to.
You'll learn how to write your log messages in a way thats useful both for people and for software.
You'll learn which observability problems are uniquely suited to logs, and which should be better left for other pillars.
You'll learn how to implement logs in the reference application (the "delivery options" service)
You'll learn how to consume logs in some basic ways, without digging too deep into vendor tooling.
You'll learn a common way of handling logs in the Linux operating system, as well as how you can go and explore it yourself.
In this quiz, we'll recap some of the content we went through to validate our understanding of logs.
In this lecture, you'll learn what to expect in this section.
In this unit, you'll learn more about the core problem that necessitates DIstributed Tracing.
In this lecture, you'll learn about where distributed tracing came from.
In this lecture, you'll learn why I rank tracing as the most important observability pillar.
In this lecture, we'll set up the user interface we'll use as a reference implementation of Tracing: Jaeger.
In this lecture, we'll briefly examine the data pipeline that allows us to deliver our diagnostics to their eventual destination, in this case, Jaeger.
In this lecture, we'll learn more about the functional unit of distributed tracing: "spans".
In this lecture, we'll go through how to create a "distributed trace" from a set of "spans".
In this lecture, we'll learn how to indicate whether or not our "operations" succeeded or failed, as well as what type of operation they are.
In this lecture, we'll learn how to enrich our spans with metadata to understand the context of what is happening in the request.
In this lecture, we'll go through how to make the instrumentation substantially more efficient, as well as review what can be a dark side of automated instrumentation.
In this lecture, we'll take a quick break from the theory of distributed tracing and talk about one example of where distributed tracing has been very useful.
In this lecture, we'll learn how to model state transitions or other events within distributed traces, even if they do not have the required "start" and "end" properties.
In this lecture, we'll learn how to propagate context across different microservices such that we can create a single, large diagnostic record that spans the breadth of our microservice architecture.
In this lecture, we'll learn how to read the broader system state supplied by client services within our service and export that so its easier to reason through the state of our service within that broad system.
This lecture will introduce the upcoming quiz
Review your knowledge of distributed tracing
In this lecture, you'll learn that while there are problems that you've been able to solve so far, there are still problems that remain unsolvable with the material covered so far. We'll also cover where you can see examples of metrics in your regular life
In this lecture, we'll learn a little more about the problem that metrics exists to solve, as well as how metrics are common parts of other telemetry.
In this lecture, you'll learn how the thinking about metrics has evolved over the past 20 years or so.
At the end of this unit, you'll be able to install Prometheus and the node exporter on a Debian-based machine and then view the CPU usage. You'll also know how to check for similar metrics and where to learn more about them.
At the end of this lecture, you can install Grafana and use it to access Prometheus running on the same host. You'll be able to create your own, sample graph as well as import a third-party graph from Grafana Online.
At the end of this lecture, you'll be able to know the different types of "metrics" that we use to solve different classes of problems.
At the end of this lecture, you'll know how the aggregation and filtering works on modern metrics systems.
After watching this lecture, you'll understand the common wire formats that you will find metrics in.
You'll learn how to configure Prometheus receive to OpenTelemetry.
You'll learn how to connect your application with a time series data store, like Prometheus.
In this lecture, you'll learn how to instrument the "garbage collector" in Go.
In this lecture, you'll learn how to export the state of your system via the proc filesystem, and use that in practice to expose a relatively new metric called "pressure stall information".
In this lecture, I'll tell you more about where you can go to learn about the planned future improvements to this course and how you can contribute ideas or suggestions to improve this course further.
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.