We may earn an affiliate commission when you visit our partners.
Jai Chenchlani

Welcome to the SRE Bootcamp | Build, Deploy, Run and Implement Observability, the only course you need to get ready to be a rockstar SRE on the job.

At 7.5 hours of lectures, demos packed with industry experience, this course is without a doubt the most practical-oriented SRE course available anywhere online. Even if you have zero understanding of SRE concepts, this course will take you from beginner to intermediate levels of proficiency, and will enable you on implementing, not just understanding theory. Here are the reasons why:

Read more

Welcome to the SRE Bootcamp | Build, Deploy, Run and Implement Observability, the only course you need to get ready to be a rockstar SRE on the job.

At 7.5 hours of lectures, demos packed with industry experience, this course is without a doubt the most practical-oriented SRE course available anywhere online. Even if you have zero understanding of SRE concepts, this course will take you from beginner to intermediate levels of proficiency, and will enable you on implementing, not just understanding theory. Here are the reasons why:

  • The course is taught by an industry expert on the subject, who is a daily practitioner himself.

  • The instructor is an SRE interviewer, and knows exactly what is needed in a candidate to succeed.

  • The demos and the corresponding GitHub repo access will enable you to not just follow-along, but reuse the instructor's months of hard work, and apply on the job.

  • The course is current with 2023 trends, hence ensures that you'll be learning the latest tools and technologies used at large companies running their applications on Google Cloud.

  • The curriculum was developed over a period of 1 year, after a dry-run of the content with a private group of students.

I will take you step-by-step through engaging video tutorials and teach you everything you need to know to succeed as an SRE.

The course includes hands-on demos that build your SRE expertise; this enables you to be productive day 1 as a GCP SRE.

Throughout this course, we cover SRE relevant tools and technologies in details, with demos, including:

  • Site Reliability Engineering origin

  • Observability core concepts - Golden Signals, SLIs, SLOs, Error Budgets

  • Understands the characteristics of a good SRE

  • Get enabled on SRE foundational skillset - Linux, vi editor, ip sebnetting etc.

  • GCP CLI - gcloud and kubectl

  • Deploy apps in all forms of compute on GCP -

  • GCP Logging and Monitoring, Log based metrics

  • Observability Tools - GCP Native Monitoring,  and Grafana

  • Troubleshooting tools and techniques using Cloud logging and monitoring and kubectl.

By the end of this course, you will be confident, not just clearing SRE job interviews, but also being productive and efficient as an SRE.

REMEMBER… I'm so confident that you'll love this course that I'm offering a FULL money-back guarantee for 30 days. So it's a complete no-brainer, sign up today with ZERO risk and

This course is the best way to get ready to crack the toughest of SRE interviews, and be ready to work efficiently as an SRE.

Don’t waste any more time wondering what course is best for you. You’ve already found it. Get started right away.

Enroll now

What's inside

Learning objectives

  • Thorough understanding of what site reliability engineering is
  • Gcp overview - compute, containers, storage and observability
  • Characteristics of a good sre and sre foundational skillset
  • Sre foundation skills - linux, automation, ip address subnetting
  • Sre foundation skills - cli | vi editor, gcloud, kubectl
  • Gce | build infra, deploy app and implement observability
  • Gke | build infra, deploy app and implement observability
  • Cloud run | build infra, deploy app and implement observability
  • Ability to implement observability using gcp native monitoring and grafana
  • Ability to troubleshoot issues/errors in production - that's when you get ready to rock on the job!

Syllabus

Instructor Introduction and Initial Setup
Instructor Introduction
Instructor Coordinates
Instructor Coordinates - Links
Read more
Agenda
GCP Bootstrap
Introduction to Site Reliability Engineering and Observability core concepts. Discuss who is an SRE, characteristics of a good SRE and the foundational skills an SRE must possess.
Introduction
Site Reliability Engineering
Site Reliability Engineer
Recap
Get an overview of GCP Services to visualize possibilities of what can be done in Cloud.
System Info
GCP Overview
GCP Services used in this course
A crash course in Linux that will get you hands-on in an hour.
Help Yourselves
Basic Commands
Troubleshooting
Find
Manipulate File Content
Grep
File Permissions
Crontab
OS Distributions
Importance of Automation for SREs, and 3 examples.
Example 1 | ZSH Profile
Example 2 | getcmd utility
Example 3 | getroles utility
Utilities Bash Scripts
Google Cloud Command-Line Interface
Help Yourselves - Documentation, Cheatsheets and more
Formatting Output
Filter and Sort Results
Command-line tool for interacting with Kubernetes clusters. It serves as your main control panel for managing and deploying containerized applications within the Kubernetes environment.
GKE Cluster and Context
kubectl GET | the most used command
Deploy apps - Declarative
Deploy apps - Imperative
The most commonly used text editor on linux machines. It was first released in 1976, but still widely used today. Despite its age, it offers efficient text editing features.
Basic Navigation
Edit - Insert | Delete | Copy | Paste
Search and Replace
Configuration | .vimrc profile
Cheatsheet
IP addresses serve as unique identifiers for devices within GCP networks, allowing them to communicate with each other and with resources on the internet.
RFC1918 Introduction
Understand CIDR Notation
Subnetting Exercise
Subnetting Exercise Solution
Subnetting | Implement the solution
In this session, we configure and run Apache webserver in GCE. This application will serve as our demo app in GCE for implementing Observability and Golden Signals dashboards.
Create a VM from GCP Console
SSH into the VM and use Linux commands to validate our VM
Install Apache Webserver on the VM
Configure Apache web-server with our own HTML
SREs love Logs. The logs help with building observability, that further helps with running your applications with the desired levels of reliability in production.
Missing Telemetry without the Ops Agent

Refer https://cloud.google.com/logging/docs/agent/ops-agent/third-party/apache for the logs configuration.

Apache Log Configuration
Validate Logs captured by Monitoring Ops Agent
Create Log Based Metrics
Leverage GCP Logging and Monitoring services to build Observability i.e. Golden Signals dashboards - Traffic, Errors, Latency and Saturation.
Out of the box GCP dashboards
Apache webserver configuration update to capture response time
Latency Log Based Metrics
Traffic Chart - Apache webserver metrics
Traffic Chart - Log based metrics
Availability SLI and GCP Native Chart
Latency Chart
Grafana is an open-source platform for visualizing and understanding your IT infrastructure's metrics. It's a powerful dashboard builder health and performance of your systems in real-time.
Install Grafana
Address Firewall Issue and Login to Grafana Application
Configure additional data sources as needed in Grafana, and use the metrics to Implement Observability, Golden Signals.
Configure Data Sources - Monitoring

Save this course

Save SRE Bootcamp | Build,Deploy,Run and Implement Observability to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in SRE Bootcamp | Build,Deploy,Run and Implement Observability with these activities:
Review Linux Fundamentals
Strengthen your understanding of Linux fundamentals to better grasp the command-line tools and server configurations used throughout the SRE bootcamp.
Browse courses on Linux Command Line
Show steps
  • Review basic Linux commands like ls, cd, mkdir, rm, and cp.
  • Practice using the command line to navigate the file system.
  • Familiarize yourself with file permissions and ownership.
Brush up on Networking Basics
Review networking concepts like IP addressing and subnetting to better understand how services communicate within GCP and how to configure network policies.
Show steps
  • Review the basics of IP addressing and subnetting.
  • Understand CIDR notation and how it's used to define network ranges.
  • Practice subnetting exercises to reinforce your understanding.
Read 'Site Reliability Engineering' by O'Reilly
Gain a deeper understanding of SRE principles and practices by reading the foundational text on the subject.
Show steps
  • Obtain a copy of the 'Site Reliability Engineering' book.
  • Read the book, focusing on chapters related to observability, monitoring, and incident response.
  • Take notes on key concepts and practices.
Four other activities
Expand to see all activities and additional details
Show all seven activities
Practice gcloud and kubectl commands
Reinforce your command-line skills by practicing common gcloud and kubectl commands used for deploying and managing applications on GCP.
Show steps
  • Set up a GCP account and install the gcloud CLI.
  • Practice using gcloud to create and manage VMs, networks, and other resources.
  • Install kubectl and configure it to connect to a Kubernetes cluster.
  • Practice using kubectl to deploy and manage applications on Kubernetes.
Document your SRE learning journey
Solidify your understanding by creating a blog or documentation outlining key SRE concepts and your experiences with the tools and technologies covered in the course.
Show steps
  • Choose a platform for your blog or documentation (e.g., Medium, GitHub Pages).
  • Write about key SRE concepts, such as observability, SLOs, and error budgets.
  • Document your experiences with the tools and technologies covered in the course, such as gcloud, kubectl, and Grafana.
Build a simple monitoring dashboard
Apply your knowledge by building a monitoring dashboard for a simple application using GCP Monitoring and Grafana.
Show steps
  • Deploy a simple application to GCP (e.g., a basic web server).
  • Configure GCP Monitoring to collect metrics from the application.
  • Set up a Grafana dashboard to visualize the metrics.
  • Add alerts to the dashboard to notify you of potential issues.
Read 'The Phoenix Project'
Understand the cultural and organizational aspects of SRE by reading this popular novel about DevOps.
Show steps
  • Obtain a copy of 'The Phoenix Project'.
  • Read the book, paying attention to the challenges faced by the IT team and how they overcome them.
  • Reflect on how the principles and practices described in the book relate to SRE.

Career center

Learners who complete SRE Bootcamp | Build,Deploy,Run and Implement Observability will develop knowledge and skills that may be useful to these careers:
Site Reliability Engineer
A Site Reliability Engineer is responsible for ensuring the reliability, performance, and scalability of systems. The SRE Bootcamp directly addresses this role's needs, as it focuses on building, deploying, running, and implementing observability. This course provides a thorough understanding of Site Reliability Engineering principles, including how to implement observability using tools like Google Cloud Platform (GCP) native monitoring and Grafana. The course covers essential skills such as Linux, automation, and IP address subnetting, which are fundamental for day-to-day tasks of an SRE. Moreover, the course covers troubleshooting techniques using Cloud Logging and Monitoring and kubectl, crucial for resolving production issues efficiently. The hands-on demos in the course, which cover deploying applications on GCP and configuring logging and monitoring, provide practical experience that you can directly apply in your work.
Cloud Engineer
A Cloud Engineer is responsible for building, deploying, and managing applications and infrastructure on cloud platforms. The SRE Bootcamp provides valuable experience in these areas, making it an excellent fit for this role. The course covers deploying applications in various forms of compute on GCP, including Google Compute Engine, Google Kubernetes Engine, and Cloud Run. It also covers the foundational skills needed to work effectively on the cloud, such as Linux, command-line interface tools, and IP address subnetting. Additionally, the course focuses on implementing observability using GCP native monitoring and Grafana, which is essential for ensuring the health and performance of cloud-based applications. A Cloud Engineer also needs to troubleshoot issues which is covered through Cloud Logging and Monitoring and kubectl.
Release Engineer
A Release Engineer manages the process of releasing software updates and new features, ensuring smooth and reliable deployments. The SRE Bootcamp helps to build a solid foundation for this role, focusing on the critical aspects of deployment and observability. The course covers deploying applications on GCP using various compute services like GCE, GKE, and Cloud Run. In addition, the course emphasizes monitoring the performance of these deployments using GCP native monitoring and Grafana. By mastering these skills, a Release Engineer can ensure that software releases are not only successful but also easily monitored and troubleshooted, leading to faster issue resolution and improved system reliability.
DevOps Engineer
A DevOps Engineer focuses on streamlining the software development lifecycle, emphasizing automation and collaboration between development and operations teams. The SRE Bootcamp can be an excellent tool to move into this role, with its focus on automation, deployment, and observability. The course helps build a foundation in essential DevOps practices, such as continuous integration and continuous delivery, by providing hands-on experience with deploying applications on GCP. It also covers foundational skills such as Linux and command-line tools, as well as advanced topics like implementing observability using GCP native monitoring and Grafana. Furthermore, the troubleshooting techniques taught in the course, using Cloud Logging and Monitoring and kubectl, are essential for identifying and resolving issues quickly in a DevOps environment. With this course, a budding DevOps Engineer will be prepared to handle real-world scenarios and improve their organization's software delivery pipeline.
Automation Engineer
An Automation Engineer designs, develops, and implements automation solutions to improve efficiency and reduce manual effort. The SRE Bootcamp can significantly aid in this role due to its strong emphasis on automation within the context of site reliability. The course directly addresses the importance of automation for SREs, providing practical examples using utilities and bash scripts. By learning to automate tasks related to deployment, monitoring, and troubleshooting on GCP, an Automation Engineer can leverage these skills to create more reliable and efficient systems. Furthermore, the course covers using tools like kubectl for automating Kubernetes cluster management, enhancing the ability to automate application deployments and scaling.
Technical Program Manager
A Technical Program Manager leads complex technical projects, coordinating efforts across multiple teams to achieve project goals. The SRE Bootcamp can provide valuable insights and skills for this role, particularly around managing deployments and ensuring system reliability. The course covers deploying applications on Google Cloud Platform using services like GCE, GKE, and Cloud Run. Also, the focus on observability, troubleshooting, and automating tasks aligns with the responsibilities of a Technical Program Manager, who needs to understand the operational aspects of the systems they are managing. By understanding the principles and practices of SRE, a Technical Program Manager can better plan and execute complex technical projects.
Systems Administrator
A Systems Administrator is responsible for managing and maintaining computer systems and servers, ensuring they are running smoothly and efficiently. The SRE Bootcamp helps you move into this role, providing training in essential system administration skills and concepts. The course covers foundational skills such as Linux, command-line tools, and IP address subnetting, which are crucial for managing systems effectively. It also covers deploying applications on GCP, a valuable skill for system administrators working in cloud environments. The observability aspects of the course, using GCP native monitoring and Grafana, can enable a Systems Administrator to proactively monitor system performance and identify potential issues. Troubleshooting techniques using Cloud Logging and Monitoring and kubectl can also help resolve issues efficiently.
Performance Engineer
A Performance Engineer analyzes and optimizes the performance of software systems to ensure they meet specified performance criteria. The SRE Bootcamp provides valuable skills for this role, particularly in the areas of monitoring, troubleshooting, and optimization. The course emphasizes observability using GCP native monitoring and Grafana, which can help Performance Engineers identify performance bottlenecks and areas for improvement. The troubleshooting techniques taught in the course using Cloud Logging and Monitoring and kubectl are also essential for diagnosing and resolving performance issues. The ability to implement and analyze golden signals such as traffic, errors, latency, and saturation, allows Performance Engineers to gain deep insights into the behavior of complex systems.
Cloud Architect
A Cloud Architect designs and implements cloud computing solutions for organizations, ensuring they are scalable, secure, and cost-effective. The SRE Bootcamp helps build skills relevant to this role, particularly in the areas of cloud deployment, observability, and automation. The course covers deploying applications on various GCP compute services, including Google Compute Engine, Google Kubernetes Engine, and Cloud Run. It also covers implementing observability using GCP native monitoring and Grafana, helping Cloud Architects design solutions that are easy to monitor and troubleshoot. Foundation skills such as Linux, automation, and IP address subnetting will also be relevant. A Cloud Architect also benefits from skills in troubleshooting using Cloud Logging and Monitoring and kubectl.
Technical Support Engineer
A Technical Support Engineer provides technical assistance to customers, helping them troubleshoot and resolve issues with software and hardware. The SRE Bootcamp may be useful for this role, offering skills and knowledge related to troubleshooting, cloud infrastructure, and monitoring. The course covers troubleshooting techniques using Cloud Logging and Monitoring and kubectl, which can be directly applied to diagnosing and resolving technical issues. The exposure to GCP and cloud deployment concepts can also help Technical Support Engineers better understand and support cloud-based applications and services. Furthermore, the focus on observability using GCP native monitoring and Grafana can provide insights into system performance that are valuable for diagnosing issues.
Software Developer
A Software Developer writes and maintains code for software applications. The SRE Bootcamp may be useful for Software Developers, particularly those working on cloud-based applications. The course covers deploying applications on GCP, which is beneficial for developers who need to deploy and manage their applications in the cloud. It also covers foundational skills such as Linux and command-line tools, which are frequently used in software development. Also, by understanding how SRE principles affect application deployment, and learning how to implement observability with GCP native monitoring and Grafana, developers can write code that is easier to monitor and troubleshoot. The troubleshooting techniques taught using Cloud Logging and Monitoring and kubectl are also helpful for debugging applications in production.
Network Engineer
A Network Engineer designs, implements, and manages computer networks. The SRE Bootcamp may be useful for network engineers, especially those working with cloud networks. The course covers IP address subnetting, a fundamental skill for network engineers. It also covers deploying applications on GCP, which can help Network Engineers understand how applications interact with the network in a cloud environment. Learning foundational skills such as Linux and working with CLI tools can help the Network Engineer. Also, familiarity with GCP native monitoring can aid in network troubleshooting.
Database Administrator
A Database Administrator manages and maintains databases, ensuring they are secure, reliable, and performant. While the SRE Bootcamp may not directly focus on database administration, it may provide relevant skills and knowledge for those working with databases in a cloud environment. The course covers foundational skills such as Linux and command-line tools, which are often used in database administration. It also covers deploying applications on GCP, which can help Database Administrators understand how databases are deployed and managed in the cloud. Observability skills, specifically using GCP native monitoring, may be useful in monitoring the performance of databases.
Security Engineer
A Security Engineer protects computer systems and networks from security threats. The SRE Bootcamp may be useful for Security Engineers, providing relevant skills in cloud security, monitoring, and incident response. The course covers deploying applications on GCP, which can help Security Engineers understand the security considerations for cloud deployments. Learning about GCP logging and cloud monitoring tools is particularly valuable, as they enable Security Engineers to detect and respond to security incidents. While the bootcamp focuses on SRE principles rather than direct security practices, the operational rigor and monitoring skills taught can enhance a Security Engineer’s ability to protect cloud infrastructure.
Product Manager
A Product Manager guides the strategy, roadmap, and feature definition for a product line. The SRE Bootcamp may be useful to Product Managers, providing knowledge on the practical aspects of reliability that informs product decisions. While this course does not focus directly on Product Management, it provides valuable insights into Site Reliability Engineering principles and tools used to manage and maintain applications in production. Understanding observability, monitoring, and troubleshooting techniques taught in the course helps Product Managers prioritize features and improvements that enhance the reliability and performance of their products. An SRE-aware Product Manager can incorporate reliability considerations into the product lifecycle, reducing the risk of issues.

Reading list

We've selected two books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in SRE Bootcamp | Build,Deploy,Run and Implement Observability.
Is considered the bible of SRE. It provides a comprehensive overview of SRE principles and practices as implemented at Google. Reading this book will give you a deeper understanding of the concepts covered in the bootcamp and provide valuable context for the hands-on exercises. It is highly recommended as a reference text for anyone serious about SRE.
This novel illustrates the importance of DevOps and SRE principles in a relatable and engaging way. While not a technical manual, it provides valuable insights into the cultural and organizational aspects of SRE. Reading this book can help you understand the 'why' behind SRE practices and how they can improve collaboration and efficiency within a team. It is more valuable as additional reading than as a current reference.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Similar courses are unavailable at this time. Please try again later.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2025 OpenCourser