LLM API spending

LLM providers charge by tokens, and on a bad night (an agent loop, a buggy prompt, failed indexing) the daily bill can grow 10–100x. Standard email alerts arrive once a day and often after the fact. A push notification at the moment of overspend lets you react within minutes.

Option 1: calculate spend on your side (most reliable)

All major SDKs return usage.input_tokens / usage.output_tokens in the response. The easiest approach is to sum them in Redis/SQLite and send an alert when a threshold is crossed:

import os, time, sqlite3, requests

PRICES = {  # $ per 1M tokens
    "claude-sonnet":  {"in": 3.0,  "out": 15.0},
    "claude-haiku":   {"in": 0.8,  "out": 4.0},
    "gpt-4o":         {"in": 2.5,  "out": 10.0},
    "gpt-4o-mini":    {"in": 0.15, "out": 0.6},
}

DB = sqlite3.connect(os.path.expanduser("~/.llm-spend.db"))
DB.execute("""CREATE TABLE IF NOT EXISTS spend(
  day TEXT, model TEXT, in_tok INT, out_tok INT, usd REAL,
  PRIMARY KEY(day, model))""")

def record(model, in_tok, out_tok):
    p = PRICES[model]
    usd = in_tok / 1e6 * p["in"] + out_tok / 1e6 * p["out"]
    day = time.strftime("%Y-%m-%d")
    DB.execute("""INSERT INTO spend VALUES (?, ?, ?, ?, ?)
                  ON CONFLICT(day, model) DO UPDATE SET
                    in_tok = in_tok + excluded.in_tok,
                    out_tok = out_tok + excluded.out_tok,
                    usd     = usd + excluded.usd""",
               (day, model, in_tok, out_tok, usd))
    DB.commit()
    check_thresholds(day)

def check_thresholds(day):
    total = DB.execute("SELECT SUM(usd) FROM spend WHERE day=?", (day,)).fetchone()[0] or 0
    for limit, prio in [(5, 5), (20, 7), (100, 10)]:
        flag = f".llm-spend-{day}-{limit}"
        if total >= limit and not os.path.exists(flag):
            requests.post(
                f"{os.environ['NOTIFLY_URL']}/message",
                params={"token": os.environ["NOTIFLY_TOKEN"]},
                json={
                    "title": f"💸 LLM-счёт за день: ${total:.2f}",
                    "message": f"Перевалили за ${limit}. Проверьте, не зацикливается ли агент.",
                    "priority": prio,
                },
                timeout=5,
            )
            open(flag, "w").close()

# usage
resp = client.messages.create(model="claude-sonnet", max_tokens=1024, messages=[...])
record("claude-sonnet", resp.usage.input_tokens, resp.usage.output_tokens)

Three thresholds (5, 20, 100 $) — three alert levels; the flag file prevents duplicate alerts within the same day.

Option 2: billing email alert → Notifly

OpenAI, Anthropic, and most providers can send email when a hard-limit/soft-limit is exceeded. Create an Email Inbox on the “LLM-spend” channel, put the received address into their notification settings — any such email instantly becomes a push message.

The advantage — works even if your service is down and you don’t have time to count tokens yourself. The downside — a delay of minutes to hours.

Option 3: scheduled cloud function on YC

If you use multiple providers and each has its own dashboard, create a small function on Yandex Cloud Functions with a timer-trigger every hour that polls the Usage API of all providers and sends an alert on exceedance. See the recipe Custom cloud function for integrity checks — the same skeleton, only instead of /health it calls /v1/usage.

What to include in the alert text

The fewer times you need to open your laptop, the better:

total amount for the day and month;
the model/agent that burned the most;
the last 3–5 requests with token counts (if available in logs);
a direct link to the provider’s dashboard.