Designing Reliable Systems

We already told a bit about designing reliable systems before. Today, we’ll go over how to design services to meet requirements for availability,durability, and scalability. We will also discuss how to implement fault-tolerant systems by avoiding single points of failure, correlated failures, and cascading failures. We will see how to avoid overload failures by using … Read more

Understanding of product reliability

Why we need establish a good product observability? How the monitoring impact on product reliability. We will explore the significance of the four golden signals in measuring the system’s performance and reliability. If you’ve ever worked with on-premises environments, you know that you can physically touch the servers. If an application becomes unresponsive, someone can … Read more

Service Reliability Hierarchy

Site reliability engineering describes the stability and quality of service that an application offers after being made available to end users. It is crucial to understand the methods and principals SRE borrow to archive the goal. This is why so important to see and understand service reliability hierarchy. A Site Reliability Engineering (SRE) pyramid, also … Read more

What is reliability engineering?

Site reliability engineering (SRE) empowers software developers to own the ongoing daily operation of their applications in production. The goal is to bridge the gap between the development team that needs to ship continuously and the operations team that’s responsible for the reliability of the production environment. Site reliability engineering shifts the responsibility of production … Read more