Input data drift (input drift)
A model can “degrade” not because the provider changed, but because users started sending different requests — longer, in another language, on different topics. This isn’t a bug, but teams often miss it because the quality of the eval set doesn’t drop (it’s fixed), while real responses get worse.
Minimal detector:
import os, json, time, statistics, requests, langdetect
STATS = "/tmp/input-drift.json"
def observe(user_input: str): s = (json.load(open(STATS)) if os.path.exists(STATS) else {"lens": [], "langs": {}, "ts": time.time()}) s["lens"] = (s["lens"] + [len(user_input)])[-1000:] try: lng = langdetect.detect(user_input) except Exception: lng = "unk" s["langs"][lng] = s["langs"].get(lng, 0) + 1 json.dump(s, open(STATS, "w"))
# once per hour check against baseline if time.time() - s["ts"] > 3600: check(s) s["ts"] = time.time() json.dump(s, open(STATS, "w"))
BASELINE = {"avg_len": 240, "ru_share": 0.85}
def check(s): avg = statistics.mean(s["lens"]) if s["lens"] else 0 total = sum(s["langs"].values()) or 1 ru_share = s["langs"].get("ru", 0) / total if avg > BASELINE["avg_len"] * 2: push("📈 Input drift: длиннее", f"avg_len={int(avg)} (baseline {BASELINE['avg_len']})", 6) if abs(ru_share - BASELINE["ru_share"]) > 0.3: push("🌍 Input drift: язык", f"ru-share={ru_share:.2f} (baseline {BASELINE['ru_share']})", 6)
def push(t, m, p): requests.post(f"{os.environ['NOTIFLY_URL']}/message", params={"token": os.environ["NOTIFLY_TOKEN"]}, json={"title": t, "message": m, "priority": p}, timeout=5)In production, it’s better to store the distribution in Redis/YDB so drift is detected across all instances at once.
Related recipes
Section titled “Related recipes”- Eval / quality drop — what to do when drift has already affected quality.
- Safety / prompt injection triggered — sometimes “drift” = an attack.