Hallucination spike

“Hallucinations” cannot be caught by a single rule, but you can count surrogate signals:

the share of answers that do not parse as the expected JSON schema;
logprobs / confidence_score below a threshold;
the judge-LLM gives 0 more often than usual;
regex sanity-check (“the model named a non-existent DB field”);
links in the response that return 404.

import os, json, time, requests, statistics

WIN = []

def observe(answer_dict):
    score = 0
    if not answer_dict.get("parsed"):           score += 1
    if answer_dict.get("logprob_mean", 0) < -2: score += 1
    if answer_dict.get("judge", 1) == 0:        score += 1
    WIN.append(score)
    if len(WIN) > 200: WIN.pop(0)
    if len(WIN) >= 50:
        rate = statistics.mean([1 if x else 0 for x in WIN])
        if rate > 0.20:
            push("👻 Hallucination spike",
                 f"Подозрительные ответы: {int(rate*100)}% за окно {len(WIN)}",
                 priority=8)

def push(t, m, p):
    requests.post(f"{os.environ['NOTIFLY_URL']}/message",
                  params={"token": os.environ["NOTIFLY_TOKEN"]},
                  json={"title": t, "message": m, "priority": p}, timeout=5)

Store this counter in Redis — it will survive restarts and cover all instances.

Eval / quality regression — formal evaluation.
Vector DB / RAG — a common cause of hallucinations (context wasn’t found).

Hallucination spike

Related recipes