Availability of LLM providers

LLM providers go down more often than you’d like: over the past year Anthropic, OpenAI and OpenRouter have experienced prolonged incidents multiple times. Most have an RSS/JSON status page, but none send a push at the moment of the incident.

We use active monitors Notifly — a feature in Yandex Cloud that itself hits the required URL once a minute and sends an alert via your channel.

1. Provider status page (HTTP)

Most SaaS expose JSON at https://status.<vendor>.com/api/v2/status.json (compatible with Statuspage.io). Under normal operation status.indicator == "none".

The simplest is to monitor the mere fact of 200 OK on a short endpoint:

{
  "kind": "http",
  "target": "https://status.anthropic.com/api/v2/status.json",
  "intervalSec": 60,
  "timeoutSec": 5,
  "expectedStatus": 200,
  "consecutiveFails": 3,
  "alertMessage": "status.anthropic.com не отвечает",
  "recoveryMessage": "Anthropic снова отвечает"
}

This covers cases where the status page is down, but doesn’t catch “status is not green”. For the latter it’s easier to write a small cloud function, which reads the JSON once a minute and sends POST /message when indicator != "none".

2. Direct ping of the real API endpoint

The status page lags — the most honest monitoring is to ping the production endpoint with a cheap model and max_tokens: 1:

# tiny health-script, convenient to wrap in a YC Function with a timer-trigger
import os, time, requests, anthropic

c = anthropic.Anthropic()
t0 = time.time()
try:
    c.messages.create(
        model="claude-haiku-4-5",
        max_tokens=1,
        messages=[{"role": "user", "content": "ping"}],
    )
    latency = time.time() - t0
    if latency > 5:
        notify("⏱️ Anthropic latency", f"{latency:.1f}s на дешёвом запросе", 7)
except Exception as e:
    notify("❌ Anthropic API", f"{type(e).__name__}: {e}", 9)

def notify(title, msg, prio):
    requests.post(
        f"{os.environ['NOTIFLY_URL']}/message",
        params={"token": os.environ["NOTIFLY_TOKEN"]},
        json={"title": title, "message": msg, "priority": prio},
        timeout=5,
    )

Cost on Haiku is tenths of a cent per day, and coverage is real end-to-end (DNS, TLS, queue, model).

3. Rate-limit / 429

Rate-limit is usually not an “incident” itself, but a signal “you have a loop in the code or you rolled out an update with aggressive fan-out”. It’s worth alerting separately from 5xx:

try:
    resp = client.messages.create(...)
except anthropic.RateLimitError as e:
    headers = getattr(e, "response", None) and e.response.headers or {}
    retry = headers.get("retry-after", "?")
    notify(
        "🚦 Anthropic rate-limit",
        f"429 пришёл. Retry-After={retry}s\nТекущая модель: {model}\n"
        f"Endpoint: {e.request.url if e.request else '?'}",
        priority=8,
    )
    raise

Extend this block for all providers you use (OpenAI: openai.RateLimitError, OpenRouter: HTTP 429 with X-RateLimit-*).

4. Provider RSS/Atom

Status pages have RSS — each incident announcement instantly becomes an email. Create an Email Inbox, pass the RSS through any RSS→Email service (or Yandex Forms / Zapier), set the obtained address as the “recipient” — and every incident message from your preferred provider will arrive to you as a push with the incident subject.

Which providers to monitor

Minimal set for active solo AI development:

Сервис	Что проверять
OpenAI / Anthropic / Mistral / Google	`chat.completions` с `max_tokens=1`
OpenRouter / Together / Groq	the same — they often fail before the primary providers
Cohere / Voyage embeddings	`/v1/embed` with a single character
Ваш fine-tune endpoint	direct ping of a selected prompt
Self-hosted vLLM / Ollama	`kind: "http"` на `/health` (без затрат на токены)