How to reduce mean time to resolution (MTTR)

To consistently address issues raised in your ITSM, you must focus on monitoring, reporting, and reviewing speed of responsiveness. Mean Time to Identify (MTTI) and Mean Time To Resolution (MTTR) are a key indicators that can provide visibility on performance and point to improvements. What is MTTR? MTTI is defined as the average time it takes … Read more

Kubernetes distributed alert management with Prometheus Operator and Flux Notification Controller

Distributed alert management (DAM) allows automatically identify a non-compliance of service level objectives and any risky activities inside a cluster and GitOps infrastructure. In my previous post, I presented a redundant monitoring infrastructure based on variety of tools such like Grafana, Prometheus, Loki and Thanos. This article focuses on a way to integrate continuous monitoring … Read more