MCP Server Health

MCP has become the primary transport for tool-use in Claude Code, GitHub Copilot Chat, Cursor, and custom agents. When an MCP server crashes or lags — the model doesn’t receive data, responds from its “head”, and hallucinates. From the user’s perspective this is not a “server error” but “AI is bad” — investigations take hours.

Notifly provides three levels of protection for MCP.

1. Active monitoring for HTTP/SSE-MCP

If your MCP server lives on HTTP/SSE — the simplest active monitor:

{
  "kind": "http",
  "target": "https://mcp.example.com/healthz",
  "intervalSec": 60,
  "consecutiveFails": 3,
  "alertMessage": "MCP-сервер не отвечает — Claude теряет инструменты",
  "recoveryMessage": "MCP-сервер снова в строю"
}

For self-hosted MCP without /healthz — monitor the TCP port itself:

{ "kind": "tcp", "target": "mcp.example.com:8443", "intervalSec": 60 }

See the built-in Notifly MCP server — it has the same MCP page.

2. Heartbeat from stdio-MCP

stdio-MCP runs locally as a child process of the IDE — there is no HTTP endpoint. Protection: your MCP server pings the Notifly heartbeat every N minutes of activity.

# in the handler of one of the tool calls
import os, time, requests, threading

PING = os.environ.get("MCP_HEARTBEAT_URL")
_last = 0

def maybe_ping():
    global _last
    if PING and time.time() - _last > 60:
        try: requests.get(PING, timeout=3)
        except Exception: pass
        _last = time.time()

@server.tool("search")
def search(query: str):
    maybe_ping()
    return ...

Set the heartbeat interval to 5–10 minutes — if the IDE doesn’t use MCP for a week, that’s not an “incident” but normal silence. An alert will arrive only when the server should have been running (IDE is open, the tool was being invoked) and suddenly stopped.

3. End-to-end check of a real tool call

Most useful is a function that every N minutes calls a real tool via an MCP client and checks the response. A scheduled cloud function is suitable:

import os, time, requests
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
import asyncio

async def probe():
    params = StdioServerParameters(
        command="npx", args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
    )
    async with stdio_client(params) as (read, write):
        async with ClientSession(read, write) as s:
            await s.initialize()
            tools = await s.list_tools()
            assert any(t.name == "read_file" for t in tools.tools)
            await s.call_tool("read_file", {"path": "/etc/hostname"})

def handler(event, context):
    t0 = time.time()
    try:
        asyncio.run(asyncio.wait_for(probe(), timeout=15))
    except Exception as e:
        notify("❌ MCP probe failed", f"{type(e).__name__}: {e}", 9)
        return {"statusCode": 200}
    ms = int((time.time() - t0) * 1000)
    if ms > 5000:
        notify("⏱️ MCP медленный", f"Probe занял {ms} мс", 7)
    return {"statusCode": 200}

Similarly — for HTTP MCP via mcp.client.http or a direct POST.

4. Versions and breaking changes

Third-party MCP servers (@modelcontextprotocol/server-*, community implementations) also have releases and also break. Monitor versions with a simple ping script:

prev=$(cat /tmp/mcp-versions || true)
cur=$(npm view @modelcontextprotocol/server-filesystem version)
if [[ "$prev" != "$cur" ]]; then
  curl -fsS "$NOTIFLY_URL/message?token=$NOTIFLY_TOKEN" \
    -H "Content-Type: application/json" \
    -d "{\"title\":\"🆕 MCP server-filesystem $cur\",\"message\":\"Прошлая версия: $prev\",\"priority\":4}"
fi
echo "$cur" > /tmp/mcp-versions

What to put in the alert text

which exact MCP is not responding (name from your mcp.json);
which tools it exported last time and which it exports now;
how many active IDE sessions rely on this server (if you know);
link to logs / dashboard.

Notifly MCP Server — our own MCP that can send notifications on behalf of the AI;
Vector DB / RAG — another typical “invisible” component;
Custom cloud function for integrity checks — where the probe for MCP lives.