Input data drift (input drift)

A model can “degrade” not because the provider changed, but because users started sending different requests — longer, in another language, on different topics. This isn’t a bug, but teams often miss it because the quality of the eval set doesn’t drop (it’s fixed), while real responses get worse.

Minimal detector:

import os, json, time, statistics, requests, langdetect

STATS = "/tmp/input-drift.json"

def observe(user_input: str):
    s = (json.load(open(STATS)) if os.path.exists(STATS) else
         {"lens": [], "langs": {}, "ts": time.time()})
    s["lens"] = (s["lens"] + [len(user_input)])[-1000:]
    try:
        lng = langdetect.detect(user_input)
    except Exception:
        lng = "unk"
    s["langs"][lng] = s["langs"].get(lng, 0) + 1
    json.dump(s, open(STATS, "w"))

    # once per hour check against baseline
    if time.time() - s["ts"] > 3600:
        check(s)
        s["ts"] = time.time()
        json.dump(s, open(STATS, "w"))

BASELINE = {"avg_len": 240, "ru_share": 0.85}

def check(s):
    avg = statistics.mean(s["lens"]) if s["lens"] else 0
    total = sum(s["langs"].values()) or 1
    ru_share = s["langs"].get("ru", 0) / total
    if avg > BASELINE["avg_len"] * 2:
        push("📈 Input drift: длиннее", f"avg_len={int(avg)} (baseline {BASELINE['avg_len']})", 6)
    if abs(ru_share - BASELINE["ru_share"]) > 0.3:
        push("🌍 Input drift: язык", f"ru-share={ru_share:.2f} (baseline {BASELINE['ru_share']})", 6)

def push(t, m, p):
    requests.post(f"{os.environ['NOTIFLY_URL']}/message",
                  params={"token": os.environ["NOTIFLY_TOKEN"]},
                  json={"title": t, "message": m, "priority": p}, timeout=5)

In production, it’s better to store the distribution in Redis/YDB so drift is detected across all instances at once.

Eval / quality drop — what to do when drift has already affected quality.
Safety / prompt injection triggered — sometimes “drift” = an attack.

Input data drift (input drift)

Related recipes