Skip to content

Hallucination spike

“Hallucinations” cannot be caught by a single rule, but you can count surrogate signals:

  • the share of answers that do not parse as the expected JSON schema;
  • logprobs / confidence_score below a threshold;
  • the judge-LLM gives 0 more often than usual;
  • regex sanity-check (“the model named a non-existent DB field”);
  • links in the response that return 404.
import os, json, time, requests, statistics
WIN = []
def observe(answer_dict):
score = 0
if not answer_dict.get("parsed"): score += 1
if answer_dict.get("logprob_mean", 0) < -2: score += 1
if answer_dict.get("judge", 1) == 0: score += 1
WIN.append(score)
if len(WIN) > 200: WIN.pop(0)
if len(WIN) >= 50:
rate = statistics.mean([1 if x else 0 for x in WIN])
if rate > 0.20:
push("👻 Hallucination spike",
f"Подозрительные ответы: {int(rate*100)}% за окно {len(WIN)}",
priority=8)
def push(t, m, p):
requests.post(f"{os.environ['NOTIFLY_URL']}/message",
params={"token": os.environ["NOTIFLY_TOKEN"]},
json={"title": t, "message": m, "priority": p}, timeout=5)

Store this counter in Redis — it will survive restarts and cover all instances.