LLM API spending
LLM providers charge by tokens, and on a bad night (an agent loop, a buggy prompt, failed indexing) the daily bill can grow 10–100x. Standard email alerts arrive once a day and often after the fact. A push notification at the moment of overspend lets you react within minutes.
Option 1: calculate spend on your side (most reliable)
Section titled “Option 1: calculate spend on your side (most reliable)”All major SDKs return usage.input_tokens / usage.output_tokens in
the response. The easiest approach is to sum them in Redis/SQLite and send an alert when a threshold is crossed:
import os, time, sqlite3, requests
PRICES = { # $ per 1M tokens "claude-sonnet": {"in": 3.0, "out": 15.0}, "claude-haiku": {"in": 0.8, "out": 4.0}, "gpt-4o": {"in": 2.5, "out": 10.0}, "gpt-4o-mini": {"in": 0.15, "out": 0.6},}
DB = sqlite3.connect(os.path.expanduser("~/.llm-spend.db"))DB.execute("""CREATE TABLE IF NOT EXISTS spend( day TEXT, model TEXT, in_tok INT, out_tok INT, usd REAL, PRIMARY KEY(day, model))""")
def record(model, in_tok, out_tok): p = PRICES[model] usd = in_tok / 1e6 * p["in"] + out_tok / 1e6 * p["out"] day = time.strftime("%Y-%m-%d") DB.execute("""INSERT INTO spend VALUES (?, ?, ?, ?, ?) ON CONFLICT(day, model) DO UPDATE SET in_tok = in_tok + excluded.in_tok, out_tok = out_tok + excluded.out_tok, usd = usd + excluded.usd""", (day, model, in_tok, out_tok, usd)) DB.commit() check_thresholds(day)
def check_thresholds(day): total = DB.execute("SELECT SUM(usd) FROM spend WHERE day=?", (day,)).fetchone()[0] or 0 for limit, prio in [(5, 5), (20, 7), (100, 10)]: flag = f".llm-spend-{day}-{limit}" if total >= limit and not os.path.exists(flag): requests.post( f"{os.environ['NOTIFLY_URL']}/message", params={"token": os.environ["NOTIFLY_TOKEN"]}, json={ "title": f"💸 LLM-счёт за день: ${total:.2f}", "message": f"Перевалили за ${limit}. Проверьте, не зацикливается ли агент.", "priority": prio, }, timeout=5, ) open(flag, "w").close()
# usageresp = client.messages.create(model="claude-sonnet", max_tokens=1024, messages=[...])record("claude-sonnet", resp.usage.input_tokens, resp.usage.output_tokens)Three thresholds (5, 20, 100 $) — three alert levels; the flag file prevents duplicate alerts within the same day.
Option 2: billing email alert → Notifly
Section titled “Option 2: billing email alert → Notifly”OpenAI, Anthropic, and most providers can send email when a hard-limit/soft-limit is exceeded. Create an Email Inbox on the “LLM-spend” channel, put the received address into their notification settings — any such email instantly becomes a push message.
The advantage — works even if your service is down and you don’t have time to count tokens yourself. The downside — a delay of minutes to hours.
Option 3: scheduled cloud function on YC
Section titled “Option 3: scheduled cloud function on YC”If you use multiple providers and each has its own dashboard, create
a small function on Yandex Cloud Functions with a timer-trigger every hour
that polls the Usage API of all providers and sends an alert on exceedance.
See the recipe Custom cloud function for integrity checks —
the same skeleton, only instead of /health it calls /v1/usage.
What to include in the alert text
Section titled “What to include in the alert text”The fewer times you need to open your laptop, the better:
- total amount for the day and month;
- the model/agent that burned the most;
- the last 3–5 requests with token counts (if available in logs);
- a direct link to the provider’s dashboard.