MCP Server Health
MCP has become the primary transport for tool-use in Claude Code, GitHub Copilot Chat, Cursor, and custom agents. When an MCP server crashes or stalls — the model doesn’t receive data, responds from its “head”, hallucinates. From the user’s perspective this is not a “server error” but “the AI is bad” — the investigation takes hours.
Notifly provides three levels of protection for MCP.
1. Active monitoring for HTTP/SSE MCP
Section titled “1. Active monitoring for HTTP/SSE MCP”If your MCP server lives on HTTP/SSE — the simplest active monitor is:
{ "kind": "http", "target": "https://mcp.example.com/healthz", "intervalSec": 60, "consecutiveFails": 3, "alertMessage": "MCP-сервер не отвечает — Claude теряет инструменты", "recoveryMessage": "MCP-сервер снова в строю"}For self-hosted MCP without /healthz — monitor the TCP port itself:
{ "kind": "tcp", "target": "mcp.example.com:8443", "intervalSec": 60 }See the built-in Notifly MCP server — it has the same MCP page.
2. Heartbeat from stdio-MCP
Section titled “2. Heartbeat from stdio-MCP”stdio MCP runs locally as a child process of the IDE — there is no HTTP endpoint. Protection: your MCP server pings the Notifly heartbeat itself every N minutes of activity.
# in the handler of one of the tool invocationsimport os, time, requests, threading
PING = os.environ.get("MCP_HEARTBEAT_URL")_last = 0
def maybe_ping(): global _last if PING and time.time() - _last > 60: try: requests.get(PING, timeout=3) except Exception: pass _last = time.time()
@server.tool("search")def search(query: str): maybe_ping() return ...Set the heartbeat interval to 5–10 minutes — if the IDE doesn’t use MCP for a week,
that’s not an “incident” but normal silence. An alert will arrive only when
the server should have been running (IDE is open, a tool was invoked) and suddenly
stopped.
3. End-to-end verification of a real tool invocation
Section titled “3. End-to-end verification of a real tool invocation”Most useful is a function that every N minutes calls a real tool via the MCP client and checks the response. A scheduled cloud function is suitable:
import os, time, requestsfrom mcp import ClientSession, StdioServerParametersfrom mcp.client.stdio import stdio_clientimport asyncio
async def probe(): params = StdioServerParameters( command="npx", args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"], ) async with stdio_client(params) as (read, write): async with ClientSession(read, write) as s: await s.initialize() tools = await s.list_tools() assert any(t.name == "read_file" for t in tools.tools) await s.call_tool("read_file", {"path": "/etc/hostname"})
def handler(event, context): t0 = time.time() try: asyncio.run(asyncio.wait_for(probe(), timeout=15)) except Exception as e: notify("❌ MCP probe failed", f"{type(e).__name__}: {e}", 9) return {"statusCode": 200} ms = int((time.time() - t0) * 1000) if ms > 5000: notify("⏱️ MCP медленный", f"Probe занял {ms} мс", 7) return {"statusCode": 200}Similarly — for HTTP MCP use mcp.client.http or a direct POST.
4. Versions and breaking changes
Section titled “4. Versions and breaking changes”Third-party MCP servers (@modelcontextprotocol/server-*, community
implementations) also have releases and can break. Monitor versions
with a simple ping script:
prev=$(cat /tmp/mcp-versions || true)cur=$(npm view @modelcontextprotocol/server-filesystem version)if [[ "$prev" != "$cur" ]]; then curl -fsS "$NOTIFLY_URL/message?token=$NOTIFLY_TOKEN" \ -H "Content-Type: application/json" \ -d "{\"title\":\"🆕 MCP server-filesystem $cur\",\"message\":\"Прошлая версия: $prev\",\"priority\":4}"fiecho "$cur" > /tmp/mcp-versionsWhat to include in the alert text
Section titled “What to include in the alert text”- which MCP specifically is not responding (the name from your
mcp.json); - which tools it exported last time and which it exports now;
- how many active IDE sessions rely on this server (if known);
- a link to logs / dashboard.
Related recipes
Section titled “Related recipes”- Notifly MCP Server — our own MCP which can send notifications on behalf of the AI;
- Vector DB / RAG — another typical “invisible” component;
- Custom cloud function for integrity checks — where the probe for MCP lives.