Availability of LLM providers
LLM providers go down more often than you’d like: over the past year Anthropic, OpenAI and OpenRouter have experienced prolonged incidents multiple times. Most have an RSS/JSON status page, but none send a push at the moment of the incident.
We use active monitors Notifly — a feature in Yandex Cloud that itself hits the required URL once a minute and sends an alert via your channel.
1. Provider status page (HTTP)
Section titled “1. Provider status page (HTTP)”Most SaaS expose JSON at https://status.<vendor>.com/api/v2/status.json
(compatible with Statuspage.io). Under normal operation status.indicator == "none".
The simplest is to monitor the mere fact of 200 OK on a short endpoint:
{ "kind": "http", "target": "https://status.anthropic.com/api/v2/status.json", "intervalSec": 60, "timeoutSec": 5, "expectedStatus": 200, "consecutiveFails": 3, "alertMessage": "status.anthropic.com не отвечает", "recoveryMessage": "Anthropic снова отвечает"}This covers cases where the status page is down, but doesn’t catch “status is not green”.
For the latter it’s easier to write a small cloud function,
which reads the JSON once a minute and sends POST /message when indicator != "none".
2. Direct ping of the real API endpoint
Section titled “2. Direct ping of the real API endpoint”The status page lags — the most honest monitoring is to ping the production endpoint
with a cheap model and max_tokens: 1:
# tiny health-script, convenient to wrap in a YC Function with a timer-triggerimport os, time, requests, anthropic
c = anthropic.Anthropic()t0 = time.time()try: c.messages.create( model="claude-haiku-4-5", max_tokens=1, messages=[{"role": "user", "content": "ping"}], ) latency = time.time() - t0 if latency > 5: notify("⏱️ Anthropic latency", f"{latency:.1f}s на дешёвом запросе", 7)except Exception as e: notify("❌ Anthropic API", f"{type(e).__name__}: {e}", 9)
def notify(title, msg, prio): requests.post( f"{os.environ['NOTIFLY_URL']}/message", params={"token": os.environ["NOTIFLY_TOKEN"]}, json={"title": title, "message": msg, "priority": prio}, timeout=5, )Cost on Haiku is tenths of a cent per day, and coverage is real end-to-end (DNS, TLS, queue, model).
3. Rate-limit / 429
Section titled “3. Rate-limit / 429”Rate-limit is usually not an “incident” itself, but a signal “you have a loop in the code or you rolled out an update with aggressive fan-out”. It’s worth alerting separately from 5xx:
try: resp = client.messages.create(...)except anthropic.RateLimitError as e: headers = getattr(e, "response", None) and e.response.headers or {} retry = headers.get("retry-after", "?") notify( "🚦 Anthropic rate-limit", f"429 пришёл. Retry-After={retry}s\nТекущая модель: {model}\n" f"Endpoint: {e.request.url if e.request else '?'}", priority=8, ) raiseExtend this block for all providers you use (OpenAI: openai.RateLimitError, OpenRouter: HTTP 429 with X-RateLimit-*).
4. Provider RSS/Atom
Section titled “4. Provider RSS/Atom”Status pages have RSS — each incident announcement instantly becomes an email. Create an Email Inbox, pass the RSS through any RSS→Email service (or Yandex Forms / Zapier), set the obtained address as the “recipient” — and every incident message from your preferred provider will arrive to you as a push with the incident subject.
Which providers to monitor
Section titled “Which providers to monitor”Minimal set for active solo AI development:
| Сервис | Что проверять |
|---|---|
| OpenAI / Anthropic / Mistral / Google | chat.completions с max_tokens=1 |
| OpenRouter / Together / Groq | the same — they often fail before the primary providers |
| Cohere / Voyage embeddings | /v1/embed with a single character |
| Ваш fine-tune endpoint | direct ping of a selected prompt |
| Self-hosted vLLM / Ollama | kind: "http" на /health (без затрат на токены) |