Site Reliability Engineers must have the right tools and strategies to perform in a technical, fast-paced environment. IBM Cloud SRE is guided by nine competency areas that lead to the successful practice of the discipline:
Site Reliability Engineers must have the right tools and strategies to perform in a technical, fast-paced environment. IBM Cloud SRE is guided by nine competency areas that lead to the successful practice of the discipline:
● Applying Site Reliability Engineering principles
● Operations
● Monitoring and incident management
● Security and compliance
● Compute infrastructure
● Networking
● Storage and data management
● Reliability and resiliency
● Deployment automation
In this first course of the three-part Professional Certificate in Site Reliability Engineering (SRE), you will focus on the first four SRE competencies:
● Applying Site Reliability Engineering principles
● Operations
● Monitoring and incident management
● Security and compliance
NOTE: The remaining five SRE competencies are covered in Course 2: SRE Infrastructure, Resiliency and Deployment Automation.
This course covers approximately 50% of the content required to help you prepare for the “IBM Certified Professional SRE - Cloud V2” certification exam.
If you are interested in pursuing the “IBM Certified Professional SRE - Cloud V2” certification, we recommend that you complete all three offerings of the Professional Certificate in Site Reliability Engineering (SRE) to ensure a successful certification exam experience.
Applying Site Reliability Engineering principles
● Manage the trade-off between change, velocity, and reliability of services
● Negotiate service level objectives, service level indicators, and error budgets
● Design and deploy automation strategies
● Leverage IBM Cloud tools and technology across the software development life cycle
● Understand the roles and responsibilities for SRE effectiveness
Operations
● Monitor resource utilization
● Perform operational readiness review (ORR)
● Employ cost-optimization strategies
● Identify key metrics for service health
Monitoring and incident management
● Create and maintain metrics, traces, and alerts
● Collect, analyze, and manage logs on IBM Cloud
● Manage incidents
● Perform post incident review
● Recognize and differentiate performance and availability metrics
● Perform statistical analysis and create actionable outcomes
Security and compliance
● Monitor security threats
● Implement and manage security policies
● Implement encryption models
● Manage role-based access control (RBAC) on IBM Cloud
● Define the shared responsibility model ****
OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.
Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.
Find this site helpful? Tell a friend about us.
We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.
Your purchases help us maintain our catalog and keep our servers humming without ads.
Thank you for supporting OpenCourser.