Skip to content

LLM API spending

LLM providers charge by tokens, and on a bad night (an agent loop, a buggy prompt, failed indexing) the daily bill can grow 10–100x. Standard email alerts arrive once a day and often after the fact. A push notification at the moment of overspend lets you react within minutes.

Option 1: calculate spend on your side (most reliable)

Section titled “Option 1: calculate spend on your side (most reliable)”

All major SDKs return usage.input_tokens / usage.output_tokens in the response. The easiest approach is to sum them in Redis/SQLite and send an alert when a threshold is crossed:

import os, time, sqlite3, requests
PRICES = { # $ per 1M tokens
"claude-sonnet": {"in": 3.0, "out": 15.0},
"claude-haiku": {"in": 0.8, "out": 4.0},
"gpt-4o": {"in": 2.5, "out": 10.0},
"gpt-4o-mini": {"in": 0.15, "out": 0.6},
}
DB = sqlite3.connect(os.path.expanduser("~/.llm-spend.db"))
DB.execute("""CREATE TABLE IF NOT EXISTS spend(
day TEXT, model TEXT, in_tok INT, out_tok INT, usd REAL,
PRIMARY KEY(day, model))""")
def record(model, in_tok, out_tok):
p = PRICES[model]
usd = in_tok / 1e6 * p["in"] + out_tok / 1e6 * p["out"]
day = time.strftime("%Y-%m-%d")
DB.execute("""INSERT INTO spend VALUES (?, ?, ?, ?, ?)
ON CONFLICT(day, model) DO UPDATE SET
in_tok = in_tok + excluded.in_tok,
out_tok = out_tok + excluded.out_tok,
usd = usd + excluded.usd""",
(day, model, in_tok, out_tok, usd))
DB.commit()
check_thresholds(day)
def check_thresholds(day):
total = DB.execute("SELECT SUM(usd) FROM spend WHERE day=?", (day,)).fetchone()[0] or 0
for limit, prio in [(5, 5), (20, 7), (100, 10)]:
flag = f".llm-spend-{day}-{limit}"
if total >= limit and not os.path.exists(flag):
requests.post(
f"{os.environ['NOTIFLY_URL']}/message",
params={"token": os.environ["NOTIFLY_TOKEN"]},
json={
"title": f"💸 LLM-счёт за день: ${total:.2f}",
"message": f"Перевалили за ${limit}. Проверьте, не зацикливается ли агент.",
"priority": prio,
},
timeout=5,
)
open(flag, "w").close()
# usage
resp = client.messages.create(model="claude-sonnet", max_tokens=1024, messages=[...])
record("claude-sonnet", resp.usage.input_tokens, resp.usage.output_tokens)

Three thresholds (5, 20, 100 $) — three alert levels; the flag file prevents duplicate alerts within the same day.

OpenAI, Anthropic, and most providers can send email when a hard-limit/soft-limit is exceeded. Create an Email Inbox on the “LLM-spend” channel, put the received address into their notification settings — any such email instantly becomes a push message.

The advantage — works even if your service is down and you don’t have time to count tokens yourself. The downside — a delay of minutes to hours.

If you use multiple providers and each has its own dashboard, create a small function on Yandex Cloud Functions with a timer-trigger every hour that polls the Usage API of all providers and sends an alert on exceedance. See the recipe Custom cloud function for integrity checks — the same skeleton, only instead of /health it calls /v1/usage.

The fewer times you need to open your laptop, the better:

  • total amount for the day and month;
  • the model/agent that burned the most;
  • the last 3–5 requests with token counts (if available in logs);
  • a direct link to the provider’s dashboard.