Skip to content

Vector DB and RAG infrastructure

In a RAG application the vector DB is a separate single-point-of-failure that almost nobody monitors: a search can return an empty array, and the user will just get “the model is being dumb today”, with no visible error.

Notifly addresses this with three layers:

1. Active TCP/HTTP monitoring of the vector DB

Section titled “1. Active TCP/HTTP monitoring of the vector DB”

Most popular vector-DBs expose /health or at least an open port. Set up an active monitor:

// Qdrant Cloud
{
"kind": "http",
"target": "https://your-cluster.qdrant.tech:6333/healthz",
"intervalSec": 60,
"alertMessage": "Qdrant не отвечает — RAG будет возвращать пустые ответы"
}
// Self-hosted Weaviate
{
"kind": "http",
"target": "https://weaviate.example.com/v1/.well-known/ready",
"intervalSec": 60,
"consecutiveFails": 3
}
// Pinecone (status API)
{
"kind": "http",
"target": "https://status.pinecone.io/api/v2/status.json",
"intervalSec": 120
}
// Self-hosted with Postgres+pgvector
{
"kind": "postgres",
"target": "vec.example.com:5432",
"intervalSec": 60
}

See the full list of supported kinds in the monitors.

/health is green, but the collection is corrupted / the request quota is exhausted? A small cloud function with a timer trigger that performs a real search and compares the number of results:

import os, requests, time
from qdrant_client import QdrantClient
q = QdrantClient(url=os.environ["QDRANT_URL"], api_key=os.environ["QDRANT_KEY"])
CANARY_VEC = [0.0] * 1536 # any stable vector
EXPECTED_MIN = 5 # we know the collection has > 5 chunks
def handler(event, context):
t0 = time.time()
try:
hits = q.search(collection_name="docs", query_vector=CANARY_VEC, limit=10)
except Exception as e:
notify("❌ Qdrant search", f"{type(e).__name__}: {e}", 9)
return {"statusCode": 200}
ms = int((time.time() - t0) * 1000)
if len(hits) < EXPECTED_MIN:
notify("⚠️ Qdrant пустые результаты",
f"Получено {len(hits)} (ожидалось ≥ {EXPECTED_MIN}) за {ms} мс",
priority=8)
elif ms > 1500:
notify("⏱️ Qdrant медленно",
f"Поиск занял {ms} мс. Проверьте размер индекса и шарды.",
priority=6)
return {"statusCode": 200}
def notify(t, m, prio):
requests.post(f"{os.environ['NOTIFLY_URL']}/message",
params={"token": os.environ["NOTIFLY_TOKEN"]},
json={"title": t, "message": m, "priority": prio},
timeout=5)

Same approach for Weaviate (client.query.get(...)), Pinecone (index.query(...)) and pgvector (SELECT ... <-> $1 LIMIT 10).

Indexing (crawling → embeddings → upsert) usually runs on a cron. If it fails during the night, search quality will be poor during the day and nobody will notice.

Create a heartbeat with intervalSec slightly longer than the period of your cron job. Ping it at the very end of a successful run:

Окно терминала
*/30 * * * * /usr/local/bin/reindex.sh && curl -fsS "$REINDEX_PING_URL" -o /dev/null

Any missed ping will cause a push “indexing didn’t complete” within intervalSec + graceSec.

A useful signal: the collection has shrunken by 10x — almost certainly someone reindexed into the wrong collection. Check once an hour:

prev = int(open("/tmp/qdrant-count.txt").read() or 0) if os.path.exists("/tmp/qdrant-count.txt") else 0
cur = q.get_collection("docs").points_count
if prev and abs(cur - prev) / max(prev, 1) > 0.5:
notify("⚠️ Qdrant: drift",
f"Размер коллекции: {prev}{cur}", priority=8)
open("/tmp/qdrant-count.txt", "w").write(str(cur))
  • collection/index name;
  • last successful indexing (timestamp);
  • size before/after, latency, error class;
  • link to your vector-DB dashboard (Qdrant Web UI, Pinecone Console, etc.).