Cost-per-request increase
Scenario: you added 200 lines to the system prompt, or an agent started attaching the entire repository diff to every request. The total budget hasn’t been exceeded yet, but cost / request has tripled — and you’ll notice it on the bill within a day.
import os, json, time, statistics, requests
W = "/tmp/cpr.json"
def observe(usd_per_request): s = (json.load(open(W)) if os.path.exists(W) else {"vals": []}) s["vals"] = (s["vals"] + [usd_per_request])[-500:] if len(s["vals"]) > 100: med = statistics.median(s["vals"][:-50]) recent = statistics.median(s["vals"][-50:]) if recent > med * 2 and recent > 0.001: push("💵 cost/req растёт", f"Медиана за 500: ${med:.4f}\n" f"Медиана за последние 50: ${recent:.4f} ({recent/med:.1f}×)", priority=8) json.dump(s, open(W, "w"))
def push(t, m, p): requests.post(f"{os.environ['NOTIFLY_URL']}/message", params={"token": os.environ["NOTIFLY_TOKEN"]}, json={"title": t, "message": m, "priority": p}, timeout=5)In the push notification, include the top 3 endpoints by cost — the regression is usually in one of them.
Related recipes
Section titled “Related recipes”- LLM API spend — the overall budget.
- Drop in prompt-cache hit-rate — a related symptom.