Approaching provider monthly quota

LLM providers have rate-tiers (Tier 1, Tier 2, …) and monthly quotas, after which your account will start silently receiving 429s. Most SDKs include x-ratelimit-* or anthropic-ratelimit-tokens-remaining headers in every response — that’s enough to alert in advance.

import os, time, requests

LIMITS_FILE = "/tmp/llm-quota.json"
THRESHOLDS  = (0.70, 0.90, 0.99)

def track_quota(headers):
    used  = int(headers.get("anthropic-ratelimit-tokens-limit",     0)) - \
            int(headers.get("anthropic-ratelimit-tokens-remaining", 0))
    limit = int(headers.get("anthropic-ratelimit-tokens-limit", 0)) or 1
    ratio = used / limit
    for thr in THRESHOLDS:
        flag = f"/tmp/llm-quota-{int(thr*100)}-{time.strftime('%Y%m')}.flag"
        if ratio >= thr and not os.path.exists(flag):
            push(f"📊 LLM quota {int(thr*100)}%",
                 f"Использовано {used:,} из {limit:,} токенов tier-окна.",
                 9 if thr >= 0.9 else 6)
            open(flag, "w").close()

def push(t, m, p):
    requests.post(f"{os.environ['NOTIFLY_URL']}/message",
                  params={"token": os.environ["NOTIFLY_TOKEN"]},
                  json={"title": t, "message": m, "priority": p}, timeout=5)

Same principle applies to OpenAI (x-ratelimit-remaining-tokens), OpenRouter (X-RateLimit-Remaining-Credits) and Together (x-ratelimit-tokens-remaining-day).

LLM API costs — money, not tokens.
LLM provider availability — what to do when the quota is exhausted.

Approaching provider monthly quota

Related recipes