Become a certified site reliability engineer with this fully accredited SRE Foundation (SREF)℠ & SRE Practitioner (SREP)℠ suite from Good e-Learning!
We cover the DevOps Institute’s SRE syllabus in its entirety, teaching candidates everything they need to know about site reliability engineering and how it enables businesses to provide and scale market-leading services. Following an introduction to the principles and practices of SRE, the suite covers how to implement them and fully optimize your pipeline. Kickstart your SRE training today!
This module introduces the SRE Foundation training course. The subject matter and rationale is explained and students are given an overview of the Foundation syllabus.
Candidates also receive a toolkit containing:
Table of contents
This module introduces students to site reliability engineering as a discipline, including how it compares to DevOps. The principles and practices of SRE are also explained.
This module examines service level objectives (SLOs), service levels, error budgets, and error budgets policies.
This module introduces ‘toil’, why it represents a problem, and how it can be effectively managed.
This module focuses on service level indicators (SLIs), along with observability and monitoring.
This module looks at ‘automation’ as defined by both SRE and DevOps. It focuses on several distinct types of automation and their hierarchy, as well as popular automation tools.
This module looks at the SRE principle of learning from failure and how it relates to anti-fragility and chaos engineering.
This module examines how SRE is managed at an organizational level. It also covers how SRE is initially implemented, why so many businesses are embracing SRE, patterns for adopting SRE, sustainable incident responses, and blameless post-mortems. Finally, it covers how to utilize SRE at scale.
This module covers how SRE relates to and incorporates other popular frameworks, including IT4IT, Agile, and ITIL 4. It also considers how SRE is evolving and what kind of shape it will take in the future.
This module features two practice exams that can help candidates get used to the conditions of the Site Reliability Engineering (SRE) Foundation exam.
This module introduces students to the main features of the course, including its objectives, aims, learning plan, and structure.
Candidates are taken through the course’s syllabus and provided with a glossary, further reading and links document, diagram pack, and links to download copies of essential SRE publications. It then answers some of the most frequently asked questions regarding SRE Practitioner.
The module concludes with a brief assessment to gauge how much candidates remember from the SRE Foundation syllabus.
This module focuses on antipatterns in SRE and how such unproductive behaviors can negatively impact a pipeline.
This module examines system boundaries and demonstrates how to define system capabilities and appropriate service level indicators (SLIs) and service level objectives (SLOs). It also looks at how to measure the baseline.
The module goes on to examine multi-service architecture, along with how to calculate and utilize error budgets.
This module defines the role of a site reliability engineer when it comes to systems design, along with the important considerations regarding changing landscape and security requirements. The module then examines contemporary approaches, technology, and tools for system design, along with design patterns that help SRE practitioners build secure, resilient, reliable, and scalable systems.
This module focuses on the key elements of full-stack observability, as well as how instrumentation makes SRE systems more observable.
This is a reflective module designed to help students test their knowledge on the concepts and terms covered in modules one to four. It features a memory game as well as a concept checker.
This module examines the benefits of taking a platform-centric view when building and operating platforms as products. It goes on to look at how artificial intelligence can benefit IT operations and how to implement AI.
This module looks at the key elements of incident management in relation to the incident command framework. It also examines how the Observe, Orient, Decide, Act (OODA) loop is used to integrate technology, processes, and resources for incident responses.
This module examines ‘chaos engineering’, the discipline of experimenting with a distributed system to build confidence in its ability to survive and thrive even in turbulent conditions. It also explains how to set up game day exercises for practicing chaos engineering and dispels common myths on the subject.
This module looks at the role SRE plays in optimizing operations and fully realizing DevOps cultures. It then looks at the steps and models used to implement and execute SRE.
This is a reflective module designed to help students affirm their understanding of the concepts and terms covered in modules six to nine. It features a memory game and concept checker.
This module features two practice exams that can help candidates get used to the conditions of the Site Reliability Engineering (SRE) Practitioner exam.
This SRE course suite is designed to fully prepare students to sit the official SRE Foundation and SRE Practitioner examinations. This includes providing official practice exams to help students test themselves and get used to examination conditions.
This course comes with mock exams to help students prepare for the real thing, as well as FREE exam vouchers. (T&Cs apply)
Before booking your exam, it will be a good idea to make sure that your device meets the technical requirements. Please visit the DevOps Institute website for more information and guidance.
When you are ready to use your free exam voucher, simply contact [email protected]. Exam voucher requests are typically processed within 2 working days but please allow up to 5. Students must request their exam voucher within the course access period which starts from the date of purchase. For more information, please visit our Support & FAQs page.
‘Site Reliability Engineering (SRE)’ is the process of continuously testing the ‘reliability’ of a new product in development. This enables developers to better understand and adapt to the needs of operations teams.
There are several elements to SRE, including:
A ‘Service Level Agreement (SLA)’ is outlined to define how reliable a product has to be for end-users
An ‘Error Budget’ is established to show how much can be spent on fixing errors before production must stop
Site reliability engineers make themselves available to help with development team workloads and vice versa
Site reliability engineers actively find and repair problems during the development stage
Developers take on Operations tasks if necessary
Site reliability engineers create automation wherever possible for the sake of efficiency and reliability
A ‘site reliability engineer’ is an automation/ coding specialist whose job it is to find and solve problems within Development and Operations.
An SRE team can not only make a DevOps pipeline more reliable, but also far more efficient and scalable. It can also free Development and Operations team members to focus on improving services elsewhere, boosting the quality of releases. Incorporating SRE will also further improve existing DevOps cultures by encouraging greater communication, clarity, and understanding between teams.
Finally, site reliability engineers are specialists in considering and conveying concerns in relation to the wider organization and can extract metrics that can prove extremely valuable for other departments.
DevOps and SRE work extremely well together. This is largely because both are designed with automation, inter-team collaboration, and communication in mind, as well as boosting efficiency and reliability within IT pipelines. The SRE Practitioner qualification even comes from the DevOps Institute.
There are no prerequisites for taking this course. However, it can be helpful to have pre-existing knowledge of SRE, as well as DevOps.
SRE was originally developed by Google. Its purpose is to quantify the relationship between Development and Operations teams, ensuring that code is created efficiently, reliably, and with operational factors in mind. This is particularly valuable in organizations where IT departments and teams have become siloed from one another.
SRE is ideal for organizations that rely on developing and releasing code. It works particularly well in DevOps environments and is a popular choice with DevOps engineers and DevOps Leaders. Given the growing popularity of SRE, a qualified and experienced practitioner will often find it easier to take the next step in their career.