We may earn an affiliate commission when you visit our partners.
Course image
Marissa Moore

Site Reliability Engineers must have the right tools and strategies to perform in a fast-paced technical environment. Nine competency areas guide the successful practice of IBM Cloud SREs.

● Applying Site Reliability Engineering principles

Read more

Site Reliability Engineers must have the right tools and strategies to perform in a fast-paced technical environment. Nine competency areas guide the successful practice of IBM Cloud SREs.

● Applying Site Reliability Engineering principles

● Operations

● Monitoring and incident management

● Security and compliance

● Compute infrastructure

● Networking

● Storage and data management

● Reliability and resiliency

● Deployment automation

In this second course of the three-part Professional Certificate in Site Reliability Engineering (SRE), you will focus on the following five SRE competencies:

● Compute infrastructure

● Networking

● Storage and data management

● Reliability and resiliency

● Deployment automation

NOTE: The remaining four SRE competencies are covered in Course 1: SRE Fundamentals and Security.

This course covers approximately 50% of the required content to help you prepare for the “IBM Certified Professional SRE - Cloud V2” certification exam.

If you are interested in pursuing the “IBM Certified Professional SRE - Cloud V2” certification, to improve your passing success, we recommend that you complete all three offerings of the Professional Certificate in Site Reliability Engineering (SRE) to ensure a successful certification exam experience.

What you'll learn

Compute infrastructure

● Troubleshoot VMs, IBM Kubernetes Service (IKS), Red Hat OpenShift and serverless services on IBM Cloud

● Configure for high availability and scalability

● Explain the impact of compute on service performance

Networking

● Troubleshoot external connections to IBM Cloud

● Troubleshoot inter service connectivity on IBM Cloud

● Explain the reliability ramifications of IBM Cloud networking features

● Explain the impact of networking on service performance

Storage and data management

● Manage storage and data attributes

● Manage data replication and retention

● Explain the impact of storage on service performance

● Monitor data security and compliance

● Identify storage data durability and capacity management

Reliability and resiliency

● Design and improve reliability for the system/service

● Design for failure and recovering from failure

Deployment automation

● Design non-disruptive deployment

● Troubleshoot provisioning of IBM Cloud resources

● Implement Infrastructure as Code

● Explain the responsibilities of the SRE to the CI/CD Pipelines

● Troubleshoot CI/CD pipelines

Three deals to help you save

What's inside

Syllabus

Module 1: Compute Infrastructure
You will cover the following topics:
● IBM Cloud service models: IaaS, PaaS, and FaaS
● Troubleshooting VMs on IBM Cloud
Read more
● Troubleshooting clusters on IBM Kubernetes Service
● Troubleshooting clusters on Red Hat OpenShift on IBM Cloud
● Troubleshooting serverless services
Module 2: Networking
● Applying IBM Cloud networking features
● Implementing and managing virtual networks on IBM Cloud
● Configuring name resolution on IBM Cloud
● Managing performance on IBM Cloud
● Troubleshooting external connections on IBM Cloud
● Troubleshooting interservice connectivity on IBM Cloud
Module 3: Storage and data management
● Managing storage and data attributes
● Managing storage accounts
● Managing data on IBM Cloud
● Managing data replication and retention
Module 4: Reliability and resiliency
● Importance of reliability and resiliency for services
● Designing and improving Reliability for systems and services
● Designing for failure and recovering from failure
Module 5: Deployment automation
● Deployment automation
● Implement Infrastructure as Code
● SRE responsibilities to CI/CD pipeline

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Develops compute infrastructure skills, which are core for cloud computing professionals
Covers networking principles relevant to cloud-based systems
Explores storage and data management in cloud computing
Examines reliability and resilience strategies in cloud computing environments
Teaches deployment automation techniques for cloud platforms
Requires prerequisite knowledge in SRE fundamentals and security

Save this course

Save SRE Infrastructure, Resiliency and Deployment Automation to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in SRE Infrastructure, Resiliency and Deployment Automation with these activities:
Review networking concepts
Refresh knowledge of networking concepts before starting the course. Fill gaps in understanding.
Browse courses on Networking
Show steps
  • Review the basics of networking.
  • Read about virtual networks and name resolution.
Review IBM Cloud Documentation
Improve foundational understanding of Compute Infrastructure, Networking, Storage and Data management, Reliability and resiliency, and Deployment Automation. Prepare for success in this course.
Show steps
  • Read the IBM Cloud Documentation on Compute Infrastructure.
  • Read the IBM Cloud Documentation on Networking.
  • Read the IBM Cloud Documentation on Storage and Data Management.
  • Read the IBM Cloud Documentation on Reliability and Resiliency.
  • Read the IBM Cloud Documentation on Deployment Automation.
Practice configuring network settings
Enhance skills in configuring network settings. Focus on areas that require additional practice.
Browse courses on Networking
Show steps
  • Set up a virtual network.
  • Configure name resolution.
Four other activities
Expand to see all activities and additional details
Show all seven activities
Troubleshoot Compute Infrastructure issues
Solidify knowledge of troubleshooting Compute Infrastructure issues. Focus on areas where you may have difficulty.
Browse courses on Kubernetes
Show steps
  • Identify a Compute Infrastructure issue.
  • Gather information about the issue.
  • Research possible solutions to the issue.
  • Implement a solution to the issue.
  • Test the solution to ensure that it resolves the issue.
Discuss SRE best practices with peers
Exchange knowledge and learn from peers about SRE best practices. Identify areas of improvement.
Browse courses on SRE
Show steps
  • Join a study group or online forum for SRE professionals.
  • Participate in discussions about SRE best practices.
Set up a highly available Kubernetes cluster
Enhance knowledge of setting up highly available Kubernetes clusters. Tackle an area where you may need more practice.
Browse courses on Kubernetes
Show steps
  • Review the documentation on setting up a highly available Kubernetes cluster.
  • Gather the necessary resources to set up the cluster.
  • Set up the cluster according to the documentation.
  • Test the cluster to ensure that it is highly available.
Design a deployment automation plan
Develop a deeper understanding of designing deployment automation plans. Apply knowledge to a practical scenario.
Browse courses on Deployment Automation
Show steps
  • Identify the components of the system that need to be automated.
  • Design the automation plan using Infrastructure as Code.
  • Test the automation plan to ensure that it works as expected.

Career center

Learners who complete SRE Infrastructure, Resiliency and Deployment Automation will develop knowledge and skills that may be useful to these careers:
Site Reliability Engineer
Site Reliability Engineers are responsible for the reliability and performance of software systems. This course can help you build a strong foundation in the principles and practices of SRE, including compute infrastructure, networking, storage and data management, reliability and resiliency, and deployment automation. These skills are essential for any SRE who wants to be successful in their role.
Cloud Architect
Cloud Architects design and implement cloud-based solutions. This course can help you build a strong foundation in the principles and practices of cloud architecture, including compute infrastructure, networking, storage and data management, reliability and resiliency, and deployment automation. These skills are essential for any Cloud Architect who wants to be successful in their role.
DevOps Engineer
DevOps Engineers bridge the gap between development and operations teams. This course can help you build a strong foundation in the principles and practices of DevOps, including compute infrastructure, networking, storage and data management, reliability and resiliency, and deployment automation. These skills are essential for any DevOps Engineer who wants to be successful in their role.
Software Engineer
Software Engineers design, develop, and maintain software applications. This course can help you build a strong foundation in the principles and practices of software engineering, including compute infrastructure, networking, storage and data management, reliability and resiliency, and deployment automation. These skills are essential for any Software Engineer who wants to be successful in their role.
Data Engineer
Data Engineers design, build, and maintain data pipelines. This course can help you build a strong foundation in the principles and practices of data engineering, including compute infrastructure, networking, storage and data management, reliability and resiliency, and deployment automation. These skills are essential for any Data Engineer who wants to be successful in their role.
Network Engineer
Network Engineers design, build, and maintain computer networks. This course can help you build a strong foundation in the principles and practices of network engineering, including compute infrastructure, networking, storage and data management, reliability and resiliency, and deployment automation. These skills are essential for any Network Engineer who wants to be successful in their role.
Storage Engineer
Storage Engineers design, build, and maintain storage systems. This course can help you build a strong foundation in the principles and practices of storage engineering, including compute infrastructure, networking, storage and data management, reliability and resiliency, and deployment automation. These skills are essential for any Storage Engineer who wants to be successful in their role.
Reliability Engineer
Reliability Engineers design and implement systems that are reliable and resilient. This course can help you build a strong foundation in the principles and practices of reliability engineering, including compute infrastructure, networking, storage and data management, reliability and resiliency, and deployment automation. These skills are essential for any Reliability Engineer who wants to be successful in their role.
Resiliency Engineer
Resiliency Engineers design and implement systems that are resilient to failures. This course can help you build a strong foundation in the principles and practices of resiliency engineering, including compute infrastructure, networking, storage and data management, reliability and resiliency, and deployment automation. These skills are essential for any Resiliency Engineer who wants to be successful in their role.
Deployment Engineer
Deployment Engineers design and implement systems that are easy to deploy and maintain. This course can help you build a strong foundation in the principles and practices of deployment engineering, including compute infrastructure, networking, storage and data management, reliability and resiliency, and deployment automation. These skills are essential for any Deployment Engineer who wants to be successful in their role.

Reading list

We've selected seven books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in SRE Infrastructure, Resiliency and Deployment Automation.
This authoritative book describes how Google engineers have developed particular expertise in running production systems, and provides advice and best practices that can benefit companies implementing SRE.
Provides a comprehensive collection of coding interview questions and solutions, and it covers a variety of topics that are relevant to SRE, including data structures, algorithms, and object-oriented programming.
Provides a comprehensive guide to writing effective Java code, and it covers a variety of topics that are relevant to SRE, including memory management, concurrency, and error handling.
Provides a practical guide to design patterns, and it covers a variety of topics that are relevant to SRE, including object-oriented programming, software architecture, and code reusability.
This classic book provides a timeless perspective on software development, and it covers a variety of topics that are relevant to SRE, including project management, team dynamics, and software architecture.
Provides a comprehensive guide to writing clean and maintainable code, and it covers a variety of topics that are relevant to SRE, including code style, code organization, and error handling.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to SRE Infrastructure, Resiliency and Deployment Automation.
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser