Skip to content

Approaching provider monthly quota

LLM providers have rate-tiers (Tier 1, Tier 2, …) and monthly quotas, after which your account will start silently receiving 429s. Most SDKs include x-ratelimit-* or anthropic-ratelimit-tokens-remaining headers in every response — that’s enough to alert in advance.

import os, time, requests
LIMITS_FILE = "/tmp/llm-quota.json"
THRESHOLDS = (0.70, 0.90, 0.99)
def track_quota(headers):
used = int(headers.get("anthropic-ratelimit-tokens-limit", 0)) - \
int(headers.get("anthropic-ratelimit-tokens-remaining", 0))
limit = int(headers.get("anthropic-ratelimit-tokens-limit", 0)) or 1
ratio = used / limit
for thr in THRESHOLDS:
flag = f"/tmp/llm-quota-{int(thr*100)}-{time.strftime('%Y%m')}.flag"
if ratio >= thr and not os.path.exists(flag):
push(f"📊 LLM quota {int(thr*100)}%",
f"Использовано {used:,} из {limit:,} токенов tier-окна.",
9 if thr >= 0.9 else 6)
open(flag, "w").close()
def push(t, m, p):
requests.post(f"{os.environ['NOTIFLY_URL']}/message",
params={"token": os.environ["NOTIFLY_TOKEN"]},
json={"title": t, "message": m, "priority": p}, timeout=5)

Same principle applies to OpenAI (x-ratelimit-remaining-tokens), OpenRouter (X-RateLimit-Remaining-Credits) and Together (x-ratelimit-tokens-remaining-day).