Skip to content

SLO p95 breach

A solo developer doesn’t build a burn-rate dashboard, but knows how to compute p95 from the last N requests. That’s almost always enough.

import os, time, statistics, requests
W = [] # window
SLO_MS = 1500
WINDOW = 200
def observe(latency_ms: float):
W.append(latency_ms)
if len(W) > WINDOW: W.pop(0)
if len(W) == WINDOW:
p95 = statistics.quantiles(W, n=20)[18]
if p95 > SLO_MS:
push("📉 SLO breach: p95",
f"p95 = {int(p95)} мс (SLO: {SLO_MS})\n"
f"max = {int(max(W))} мс median = {int(statistics.median(W))} мс",
priority=8)
def push(t, m, p):
requests.post(f"{os.environ['NOTIFLY_URL']}/message",
params={"token": os.environ["NOTIFLY_TOKEN"]},
json={"title": t, "message": m, "priority": p}, timeout=5)

In production it’s better to keep the window in Redis (LPUSH/LTRIM), so that all instances see a shared sliding window.

Additionally — a separate SLO for error-rate:

if errors / total > 0.05:
push("📉 SLO breach: errors", f"{errors}/{total} ошибок", 9)