Skip to content

Availability of LLM providers

LLM providers go down more often than you’d like: over the past year Anthropic, OpenAI and OpenRouter have experienced prolonged incidents multiple times. Most have an RSS/JSON status page, but none send a push at the moment of the incident.

We use active monitors Notifly — a feature in Yandex Cloud that itself hits the required URL once a minute and sends an alert via your channel.

Most SaaS expose JSON at https://status.<vendor>.com/api/v2/status.json (compatible with Statuspage.io). Under normal operation status.indicator == "none".

The simplest is to monitor the mere fact of 200 OK on a short endpoint:

{
"kind": "http",
"target": "https://status.anthropic.com/api/v2/status.json",
"intervalSec": 60,
"timeoutSec": 5,
"expectedStatus": 200,
"consecutiveFails": 3,
"alertMessage": "status.anthropic.com не отвечает",
"recoveryMessage": "Anthropic снова отвечает"
}

This covers cases where the status page is down, but doesn’t catch “status is not green”. For the latter it’s easier to write a small cloud function, which reads the JSON once a minute and sends POST /message when indicator != "none".

The status page lags — the most honest monitoring is to ping the production endpoint with a cheap model and max_tokens: 1:

# tiny health-script, convenient to wrap in a YC Function with a timer-trigger
import os, time, requests, anthropic
c = anthropic.Anthropic()
t0 = time.time()
try:
c.messages.create(
model="claude-haiku-4-5",
max_tokens=1,
messages=[{"role": "user", "content": "ping"}],
)
latency = time.time() - t0
if latency > 5:
notify("⏱️ Anthropic latency", f"{latency:.1f}s на дешёвом запросе", 7)
except Exception as e:
notify("❌ Anthropic API", f"{type(e).__name__}: {e}", 9)
def notify(title, msg, prio):
requests.post(
f"{os.environ['NOTIFLY_URL']}/message",
params={"token": os.environ["NOTIFLY_TOKEN"]},
json={"title": title, "message": msg, "priority": prio},
timeout=5,
)

Cost on Haiku is tenths of a cent per day, and coverage is real end-to-end (DNS, TLS, queue, model).

Rate-limit is usually not an “incident” itself, but a signal “you have a loop in the code or you rolled out an update with aggressive fan-out”. It’s worth alerting separately from 5xx:

try:
resp = client.messages.create(...)
except anthropic.RateLimitError as e:
headers = getattr(e, "response", None) and e.response.headers or {}
retry = headers.get("retry-after", "?")
notify(
"🚦 Anthropic rate-limit",
f"429 пришёл. Retry-After={retry}s\nТекущая модель: {model}\n"
f"Endpoint: {e.request.url if e.request else '?'}",
priority=8,
)
raise

Extend this block for all providers you use (OpenAI: openai.RateLimitError, OpenRouter: HTTP 429 with X-RateLimit-*).

Status pages have RSS — each incident announcement instantly becomes an email. Create an Email Inbox, pass the RSS through any RSS→Email service (or Yandex Forms / Zapier), set the obtained address as the “recipient” — and every incident message from your preferred provider will arrive to you as a push with the incident subject.

Minimal set for active solo AI development:

СервисЧто проверять
OpenAI / Anthropic / Mistral / Googlechat.completions с max_tokens=1
OpenRouter / Together / Groqthe same — they often fail before the primary providers
Cohere / Voyage embeddings/v1/embed with a single character
Ваш fine-tune endpointdirect ping of a selected prompt
Self-hosted vLLM / Ollamakind: "http" на /health (без затрат на токены)