We may earn an affiliate commission when you visit our partners.
Course image
Travis Scotto, Emmanuel Apau, Sonny Sevin, and Nathan Anderson, MBA

Improve your skills in monitoring, availability, disaster recovery, infrastructure, and more with Udacity’s Cloud Incident Response & Recovery Training Course.

What's inside

Syllabus

Introduction to the course. We will look at how the topics all tie into being an SRE and what skills we'll learn and apply.
In this lesson, we will learn about how SREs monitor using SLOs and SLIs. We will create queries in Prometheus and dashboard in Grafana.
Read more
In this lesson, we will identify all IT assets, make those assets highly available, and put together a disaster recovery plan for those assets.
In this lesson, we will deploy our HA/DR infrastructure using Terraform to AWS.
In this lesson, we'll learn about database reliability and availability and how we can make databases more available. We will then deploy a replicated database cluster to AWS and also see a failover.
In this project, you will apply the skills you've learned in this course, by defining and implementing a resilient infrastructure in a cloud platform.

Good to know

Know what's good
, what to watch for
, and possible dealbreakers
Strengthens an existing foundation for intermediate learners by building on introductory topics in monitoring, availability, and disaster recovery
Develops professional skills in cloud incident response and recovery by equipping learners with practical knowledge and hands-on experience
Builds a strong foundation for beginners in cloud incident response and recovery by introducing core concepts and essential practices
Taught by seasoned professionals with extensive experience in cloud operations and incident management, ensuring high-quality instruction
Offers hands-on labs and interactive materials, providing learners with practical experience and reinforcement of concepts
Caveat: Relies on software versions that may not be the most up-to-date. Learners should be aware of this potential limitation

Save this course

Save Planning for High Availability and Incident Response to your list so you can find it easily later:
Save

Activities

Be better prepared before your course. Deepen your understanding during and after it. Supplement your coursework and achieve mastery of the topics covered in Planning for High Availability and Incident Response with these activities:
Gather and Review Course Materials
Organize and review all course materials, including notes, readings, assignments, and quizzes, to prepare for the course.
Show steps
  • Gather course materials
  • Review course syllabus
  • Review assigned readings
Follow Grafana and Prometheus Tutorial Series
Follow a series of tutorials to gain hands-on experience with monitoring tools Prometheus and Grafana, which will be used throughout the course.
Browse courses on Prometheus
Show steps
  • Find a Grafana and Prometheus tutorial series
  • Follow the tutorials
  • Experiment with the tools
Join a Study Group for Cloud Incident Response
Collaborate with peers in a study group to discuss course concepts, work on assignments together, and reinforce learning through peer-to-peer discussions.
Show steps
  • Find or create a study group
  • Set regular meeting times
  • Discuss course materials and assignments
Five other activities
Expand to see all activities and additional details
Show all eight activities
Practice AWS Terraform Deployment
Complete practice exercises to gain proficiency in using Terraform to deploy infrastructure on AWS, which will be used in a later course module.
Browse courses on Terraform
Show steps
  • Find practice exercises or tutorials on AWS Terraform deployment
  • Complete the exercises
  • Review the results
Develop a Cloud Incident Response Plan
Create a comprehensive cloud incident response plan that outlines procedures, roles, and responsibilities for managing and resolving cloud-related incidents.
Show steps
  • Gather information on cloud incident response best practices
  • Identify potential cloud incident scenarios
  • Develop a response plan for each scenario
Implement Highly Available Infrastructure Design
Design and implement a highly available infrastructure, including hardware, networking, and software components, to ensure service uptime and data integrity.
Browse courses on High Availability
Show steps
  • Design a highly available architecture
  • Implement the design using appropriate tools and technologies
  • Test the HA infrastructure
Attend a Cloud Industry Meetup
Attend a cloud industry meetup to connect with professionals in the field, learn about the latest trends, and expand knowledge beyond the classroom.
Show steps
  • Find a cloud industry meetup in your area
  • Attend the meetup
  • Network with attendees
Build a Disaster Recovery Plan for Cloud Databases
Develop a detailed disaster recovery plan for cloud databases, including strategies for data backup, replication, and failover to ensure data integrity and service availability.
Browse courses on Disaster Recovery
Show steps
  • Gather information on cloud database disaster recovery best practices
  • Identify potential database disaster scenarios
  • Develop a recovery plan for each scenario

Career center

Learners who complete Planning for High Availability and Incident Response will develop knowledge and skills that may be useful to these careers:
Site Reliability Engineer (SRE)
Site Reliability Engineers (SREs) are responsible for ensuring that systems are reliable and scalable. This course teaches students the skills they need to be successful in this role, including monitoring, availability, disaster recovery, and infrastructure management.
DevOps Engineer
DevOps Engineers work to bridge the gap between development and operations teams. This course teaches students the skills they need to be successful in this role, including monitoring, availability, disaster recovery, and infrastructure management.
Cloud Engineer
Cloud Engineers design, build, and manage cloud-based systems. This course teaches students the skills they need to be successful in this role, including monitoring, availability, disaster recovery, and infrastructure management.
Database Administrator (DBA)
Database Administrators (DBAs) are responsible for managing and maintaining databases. This course teaches students the skills they need to be successful in this role, including monitoring, availability, disaster recovery, and database reliability.
IT Manager
IT Managers are responsible for planning, implementing, and managing IT systems. This course teaches students the skills they need to be successful in this role, including monitoring, availability, disaster recovery, and infrastructure management.
Data Architect
Data Architects design and build data management systems, ensuring that data is available, reliable, and secure. This course helps build a foundation for this role by teaching students how to monitor systems, design for high availability, and implement disaster recovery plans.
IT Security Analyst
IT Security Analysts protect IT systems from security threats. This course teaches students the skills they need to be successful in this role, including monitoring, availability, disaster recovery, and infrastructure management.
IT Project Manager
IT Project Managers plan, execute, and manage IT projects. This course teaches students the skills they need to be successful in this role, including monitoring, availability, disaster recovery, and infrastructure management.
IT Consultant
IT Consultants provide advice and guidance to organizations on IT matters. This course teaches students the skills they need to be successful in this role, including monitoring, availability, disaster recovery, and infrastructure management.
IT Auditor
IT Auditors assess the security and compliance of IT systems. This course teaches students the skills they need to be successful in this role, including monitoring, availability, disaster recovery, and infrastructure management.
Network Engineer
Network Engineers design, build, and manage computer networks. This course teaches students the skills they need to be successful in this role, including monitoring, availability, disaster recovery, and infrastructure management.
Systems Analyst
Systems Analysts analyze and design business systems. This course may be useful for Systems Analysts who want to specialize in system reliability and availability.
Software Engineer
Software Engineers design, develop, and maintain software systems. This course may be useful for Software Engineers who want to specialize in system reliability and availability.
Business Analyst
Business Analysts analyze business needs and develop solutions. This course may be useful for Business Analysts who want to specialize in IT system reliability and availability.
Project Manager
Project Managers plan, execute, and manage projects. This course may be useful for Project Managers who want to specialize in IT project management.

Reading list

We've selected eight books that we think will supplement your learning. Use these to develop background knowledge, enrich your coursework, and gain a deeper understanding of the topics covered in Planning for High Availability and Incident Response.
Provides a comprehensive overview of the practices and principles of site reliability engineering (SRE), as implemented at Google. It covers topics such as monitoring, alerting, incident response, and capacity planning.
Provides a practical guide to the day-to-day work of a system administrator. It covers topics such as system configuration, performance tuning, security, and disaster recovery.
Provides a comprehensive overview of the design and implementation of data-intensive applications. It covers topics such as data modeling, data storage, and data processing.
Provides a practical guide to using Kubernetes to build and manage cloud-native applications. It covers topics such as container orchestration, service discovery, and continuous delivery.
Provides a comprehensive guide to using Kubernetes to build and manage containerized applications.
Provides a practical guide to implementing DevOps practices in your organization. It covers topics such as continuous integration, continuous delivery, and cultural change.
Provides a comprehensive guide to using Kubernetes to build and manage containerized applications.
Provides a comprehensive overview of microservices patterns. It covers topics such as service discovery, load balancing, and fault tolerance.

Share

Help others find this course page by sharing it with your friends and followers:

Similar courses

Here are nine courses similar to Planning for High Availability and Incident Response.
Understanding Placement Groups
Most relevant
Implementing Windows Server DR in an Azure Environment
Most relevant
AWS Storage Data Protection Services Getting Started
Most relevant
Implementing Nutanix High Availability and Disaster...
Most relevant
Implementing Terraform with AWS
Most relevant
Managing Azure SQL Database for the SQL Server DBA
Most relevant
Oracle Cloud Infrastructure Architect Professional
Most relevant
Designing and Implementing High Availability and Disaster...
Most relevant
Citrix ADC High Availability and Disaster Recovery
Most relevant
Our mission

OpenCourser helps millions of learners each year. People visit us to learn workspace skills, ace their exams, and nurture their curiosity.

Our extensive catalog contains over 50,000 courses and twice as many books. Browse by search, by topic, or even by career interests. We'll match you to the right resources quickly.

Find this site helpful? Tell a friend about us.

Affiliate disclosure

We're supported by our community of learners. When you purchase or subscribe to courses and programs or purchase books, we may earn a commission from our partners.

Your purchases help us maintain our catalog and keep our servers humming without ads.

Thank you for supporting OpenCourser.

© 2016 - 2024 OpenCourser