Skip to content

Input data drift (input drift)

A model can “degrade” not because the provider changed, but because users started sending different requests — longer, in another language, on different topics. This isn’t a bug, but teams often miss it because the quality of the eval set doesn’t drop (it’s fixed), while real responses get worse.

Minimal detector:

import os, json, time, statistics, requests, langdetect
STATS = "/tmp/input-drift.json"
def observe(user_input: str):
s = (json.load(open(STATS)) if os.path.exists(STATS) else
{"lens": [], "langs": {}, "ts": time.time()})
s["lens"] = (s["lens"] + [len(user_input)])[-1000:]
try:
lng = langdetect.detect(user_input)
except Exception:
lng = "unk"
s["langs"][lng] = s["langs"].get(lng, 0) + 1
json.dump(s, open(STATS, "w"))
# once per hour check against baseline
if time.time() - s["ts"] > 3600:
check(s)
s["ts"] = time.time()
json.dump(s, open(STATS, "w"))
BASELINE = {"avg_len": 240, "ru_share": 0.85}
def check(s):
avg = statistics.mean(s["lens"]) if s["lens"] else 0
total = sum(s["langs"].values()) or 1
ru_share = s["langs"].get("ru", 0) / total
if avg > BASELINE["avg_len"] * 2:
push("📈 Input drift: длиннее", f"avg_len={int(avg)} (baseline {BASELINE['avg_len']})", 6)
if abs(ru_share - BASELINE["ru_share"]) > 0.3:
push("🌍 Input drift: язык", f"ru-share={ru_share:.2f} (baseline {BASELINE['ru_share']})", 6)
def push(t, m, p):
requests.post(f"{os.environ['NOTIFLY_URL']}/message",
params={"token": os.environ["NOTIFLY_TOKEN"]},
json={"title": t, "message": m, "priority": p}, timeout=5)

In production, it’s better to store the distribution in Redis/YDB so drift is detected across all instances at once.