We may earn an affiliate commission when you visit our partners.
Course image

“This course will soon be retired"

Read more

“This course will soon be retired"

Build the skills and knowledge required to work as a Site Reliability Engineer, using IBM Cloud environments and tools. This interactive course features practice exercises and real-life scenarios as you explore the content. You will also discover the tools and SRE principles needed to manage enterprise workloads in IBM Cloud environments. Upon successful completion of IBM Cloud Associate SRE Curriculum, learners enrolled in the Verified Certifcate Track will receive an edX certificate as well as a code for 50% off the IBM Certified Associate SRE- Cloud v2 certification exam. Upon receiving a passing score on the exam, the IBM Certified Associate SRE - Cloud v2 certification will be awarded by Credly.

What's inside

Learning objectives

  • Welcome & introduction:
  • Understand day-to-day activities of an sre
  • Sre fundamentals & terminology:
  • Define key sre responsibilities and sre principles
  • Compare slos, slis, and slas and the relationship between them
  • Recall the benefits and foundational techniques of reliability and resiliency
  • Distinguish between the types of monitoring and techniques used to observe a system
  • Identify availability and performance impacts and solutions
  • Discover the value of the four golden signals and how troubleshooting is used to solve problems
  • Incident management and post incident reviews:
  • Outline key tenets of incident management and related toolchains and architecture
  • Describe the service management process and how it relates to sre
  • Define problem management and root cause analysis, their benefits and post-incident review concepts
  • Demonstrate understanding of rank-ordered actions and automation use cases
  • Observability topics:
  • Outline the benefits and strategies of monitoring and observability
  • Review types and methods of monitoring, monitoring tools, and metrics
  • Review resource utilization requirements and their application to monitoring strategies
  • Outline the purpose, benefits, and best practices of automated application monitoring
  • Demonstrate the importance of logging to sres
  • Troubleshooting and runbooks:
  • Explain how troubleshooting fits into the sre role
  • Define ibm cloud code engine and how it is used for logging, monitoring, auditing and troubleshooting information
  • Demonstrate troubleshooting techniques for virtual server instances, ibm cloud for vmware®, ibm cloud internet services, and block storage issues
  • Review how to troubleshoot a 500 internal server error
  • Operations:
  • Identify the guidelines for an orr and how to perform one
  • Apply sre roles and responsibilities to application deployments
  • Define high availability and service architecture components as it relates to the workload
  • Recall the five tenets of service management and operation as it relates to microservice architecture
  • Demonstrate best practices for backup, restore, and archive data on ibm cloud native applications
  • Define data storage replication concepts for high availability and disaster recovery across various platforms and solutions
  • Deployments:
  • Define the purpose, benefits, and activities of continuous integration, continuous delivery, and continuous deployment
  • Recall the benefits and approach of infrastructure as code with schematics
  • Compare the three zero downtime deployment models
  • Security on ibm cloud:
  • Recall how to recognize and respond to security issues
  • Define user-related security policies
  • Outline security information and event management (siem)
  • Classify ibm's security incident response management
  • Define the role of an sre in monitoring security issues

Syllabus

Module 1: Welcome & Introduction
You will cover the following topic:
Welcome and Introduction
Module 2: SRE Fundamentals & Terminology
Read more
SRE Fundamentals and Terminology
Module 3: Incident Management and Post Incident Reviews
You will cover the following topics:
Overview of Incident Management
Overview of Problem Management
Module 4: Observability Topics
Overview of Observability
Metrics, Traces, and Alerts
Monitor Resource Utilization
Understanding Observability Automation
Module 5: Troubleshooting and Runbooks
Overview of Troubleshooting
Concepts and Tools to Troubleshoot IBM Cloud Code Engine
Troubleshoot Problems Caused by Compute Infrastructure and Network
Troubleshoot Problems Caused by Storage
Exercise: Use IBM Log Analysis to Troubleshoot a Problem
Module 6: Operations
Operational Readiness Review
Overview of Resiliency for the Workload
IBM Cloud Tools and Technology for Operations Management
Implementing and Managing Backup and Recovery
Storage Replication and Failure Domains
Module 7: Deployments
Release Strategies and Concepts
Module 8: Security on IBM Cloud
Security Policies and Monitoring Threats

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Teaches skills, knowledge, and tools that are highly relevant in industry
Covers unique perspectives are ideas that may add color to other topics and subjects
Taught by IBM who are recognized in the industry for SRE
Develops professional skills or deep expertise in a particular topic
If you are interested in learning about SRE and applying it to enterprise workloads in IBM Cloud environments, this course is suitable for you
Provides an opportunity to prepare for the IBM Certified Associate SRE- Cloud v2 certification
Course topics include: SRE Fundamentals, Incident Management, Observability, Troubleshooting, Deployment, and Security on IBM Cloud

Save this course

Save IBM Cloud Associate Site Reliability Engineer to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in IBM Cloud Associate Site Reliability Engineer with these activities:
Review Prerequisites
Refresh your foundational knowledge in key SRE concepts, ensuring a solid understanding before starting the course.
Browse courses on Incident Management
Show steps
  • Review SRE Fundamentals and Terminology
  • Explore Incident Management Concepts
  • Understand Observability Principles
  • 熟悉故障排除技术
Follow SRE Learning Paths
Explore online SRE learning paths and tutorials to supplement your course learning, deepening your understanding of key concepts and practices.
Browse courses on Incident Management
Show steps
  • Search for reputable SRE learning platforms
  • Identify structured SRE learning paths
SRE Best Practices Discussion Group
Join a discussion group with fellow learners to exchange knowledge, share experiences, and explore different perspectives on SRE best practices.
Browse courses on Incident Management
Show steps
  • Join the online discussion group
  • Participate in weekly discussions
  • 分享见解和经验
Five other activities
Expand to see all activities and additional details
Show all eight activities
Troubleshooting Case Study Exercises
Engage in hands-on case studies that simulate real-world troubleshooting scenarios, improving your problem-solving skills.
Browse courses on Troubleshooting
Show steps
  • Analyze a given incident report
  • Use monitoring and logging data to identify the root cause
  • Apply troubleshooting techniques to resolve the issue
Create a Troubleshooting Playbook
Develop a comprehensive troubleshooting playbook that outlines step-by-step procedures for resolving common issues, ensuring a structured and efficient approach to problem-solving.
Browse courses on Troubleshooting
Show steps
  • Identify common issues and their symptoms
  • Define troubleshooting steps and actions
  • Document the playbook in an accessible format
Attend an SRE Workshop
Participate in an SRE workshop led by industry experts to gain hands-on experience and learn advanced techniques for implementing and managing SRE practices.
Browse courses on Observability
Show steps
  • Research and identify relevant SRE workshops
  • Register and attend the workshop
  • Actively participate in hands-on exercises
Design an Observability Framework
Develop a comprehensive observability framework to monitor, detect, and resolve issues proactively, enhancing the reliability of your systems.
Browse courses on Observability
Show steps
  • Define observability metrics and KPIs
  • Design a monitoring architecture
  • Implement logging and tracing mechanisms
  • Create dashboards and alerts
Contribute to an Open Source SRE Project
Engage with the SRE community by contributing to open source projects, gaining practical experience and staying up-to-date with industry trends.
Browse courses on Incident Management
Show steps
  • Explore open source SRE projects
  • Identify an area to contribute
  • Submit code contributions or documentation updates

Career center

Learners who complete IBM Cloud Associate Site Reliability Engineer will develop knowledge and skills that may be useful to these careers:

Reading list

We haven't picked any books for this reading list yet.

Share

Help others find this course page by sharing it with your friends and followers:
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser