Service Reliability Hierarchy

Site reliability engineering describes the stability and quality of service that an application offers after being made available to end users. It is crucial to understand the methods and principals SRE borrow to archive the goal. This is why so important to see and understand service reliability hierarchy. A Site Reliability Engineering (SRE) pyramid, also … Read more

What is reliability engineering?

Site reliability engineering (SRE) empowers software developers to own the ongoing daily operation of their applications in production. The goal is to bridge the gap between the development team that needs to ship continuously and the operations team that’s responsible for the reliability of the production environment. Site reliability engineering shifts the responsibility of production … Read more

System Reliability: implementing ‘golden metrics’

Before we start lets think first what is a system reliability means. In simple words, this is the probability of a product performing its intended function under stated conditions without failure for a given period of time. It means, among other things, continuous monitoring of the state of the system. Why this is so important … Read more