Vector DB and RAG infrastructure

In a RAG application, the vector DB is a separate single-point-of-failure that almost nobody monitors: a search can return an empty array, and the user will just get “the model is acting dumb today”, without a visible error.

Notifly covers this with three layers:

1. Active TCP/HTTP monitoring of the vector DB

Most popular vector-DBs expose /health or at least an open port. Set up an active monitor:

// Qdrant Cloud
{
  "kind": "http",
  "target": "https://your-cluster.qdrant.tech:6333/healthz",
  "intervalSec": 60,
  "alertMessage": "Qdrant не отвечает — RAG будет возвращать пустые ответы"
}

// Self-hosted Weaviate
{
  "kind": "http",
  "target": "https://weaviate.example.com/v1/.well-known/ready",
  "intervalSec": 60,
  "consecutiveFails": 3
}

// Pinecone (status API)
{
  "kind": "http",
  "target": "https://status.pinecone.io/api/v2/status.json",
  "intervalSec": 120
}

// Self-hosted with Postgres+pgvector
{
  "kind": "postgres",
  "target": "vec.example.com:5432",
  "intervalSec": 60
}

See the full list of supported kinds in the monitors.

2. End-to-end check of a real query

/health is green, but the collection is corrupted / the request quota is exhausted? A small cloud function with a timer-trigger that performs a real search and compares the number of results:

import os, requests, time
from qdrant_client import QdrantClient

q = QdrantClient(url=os.environ["QDRANT_URL"], api_key=os.environ["QDRANT_KEY"])

CANARY_VEC   = [0.0] * 1536          # any stable vector
EXPECTED_MIN = 5                     # we know the collection has > 5 chunks

def handler(event, context):
    t0 = time.time()
    try:
        hits = q.search(collection_name="docs", query_vector=CANARY_VEC, limit=10)
    except Exception as e:
        notify("❌ Qdrant search", f"{type(e).__name__}: {e}", 9)
        return {"statusCode": 200}
    ms = int((time.time() - t0) * 1000)
    if len(hits) < EXPECTED_MIN:
        notify("⚠️ Qdrant пустые результаты",
               f"Получено {len(hits)} (ожидалось ≥ {EXPECTED_MIN}) за {ms} мс",
               priority=8)
    elif ms > 1500:
        notify("⏱️ Qdrant медленно",
               f"Поиск занял {ms} мс. Проверьте размер индекса и шарды.",
               priority=6)
    return {"statusCode": 200}

def notify(t, m, prio):
    requests.post(f"{os.environ['NOTIFLY_URL']}/message",
                  params={"token": os.environ["NOTIFLY_TOKEN"]},
                  json={"title": t, "message": m, "priority": prio},
                  timeout=5)

Same approach for Weaviate (client.query.get(...)), Pinecone (index.query(...)) and pgvector (SELECT ... <-> $1 LIMIT 10).

3. Heartbeat from the indexing pipeline

Indexing (crawling → embeddings → upsert) usually runs on a cron. If it fails at night, search quality will already be poor during the day, but nobody will notice.

Create a heartbeat with intervalSec slightly larger than your cron-job period. Ping it at the very end of a successful run:

*/30 * * * * /usr/local/bin/reindex.sh && curl -fsS "$REINDEX_PING_URL" -o /dev/null

Any missed ping — and within intervalSec + graceSec you get a push “indexing didn’t complete”.

4. Alert on chunk count drift

A useful signal: the collection has “shrunk” by 10x — almost certainly someone reindexed into the wrong collection. Check hourly:

prev = int(open("/tmp/qdrant-count.txt").read() or 0) if os.path.exists("/tmp/qdrant-count.txt") else 0
cur  = q.get_collection("docs").points_count

if prev and abs(cur - prev) / max(prev, 1) > 0.5:
    notify("⚠️ Qdrant: drift",
           f"Размер коллекции: {prev} → {cur}", priority=8)

open("/tmp/qdrant-count.txt", "w").write(str(cur))

What to put in the alert text

collection/index name;
last successful indexing (timestamp);
size before/after, latency, error class;
link to your vector DB dashboard (Qdrant Web UI, Pinecone Console, etc.).