Skip to content

MCP Server Health

MCP has become the primary transport for tool-use in Claude Code, GitHub Copilot Chat, Cursor, and custom agents. When an MCP server crashes or stalls — the model doesn’t receive data, responds from its “head”, hallucinates. From the user’s perspective this is not a “server error” but “the AI is bad” — the investigation takes hours.

Notifly provides three levels of protection for MCP.

If your MCP server lives on HTTP/SSE — the simplest active monitor is:

{
"kind": "http",
"target": "https://mcp.example.com/healthz",
"intervalSec": 60,
"consecutiveFails": 3,
"alertMessage": "MCP-сервер не отвечает — Claude теряет инструменты",
"recoveryMessage": "MCP-сервер снова в строю"
}

For self-hosted MCP without /healthz — monitor the TCP port itself:

{ "kind": "tcp", "target": "mcp.example.com:8443", "intervalSec": 60 }

See the built-in Notifly MCP server — it has the same MCP page.

stdio MCP runs locally as a child process of the IDE — there is no HTTP endpoint. Protection: your MCP server pings the Notifly heartbeat itself every N minutes of activity.

# in the handler of one of the tool invocations
import os, time, requests, threading
PING = os.environ.get("MCP_HEARTBEAT_URL")
_last = 0
def maybe_ping():
global _last
if PING and time.time() - _last > 60:
try: requests.get(PING, timeout=3)
except Exception: pass
_last = time.time()
@server.tool("search")
def search(query: str):
maybe_ping()
return ...

Set the heartbeat interval to 5–10 minutes — if the IDE doesn’t use MCP for a week, that’s not an “incident” but normal silence. An alert will arrive only when the server should have been running (IDE is open, a tool was invoked) and suddenly stopped.

3. End-to-end verification of a real tool invocation

Section titled “3. End-to-end verification of a real tool invocation”

Most useful is a function that every N minutes calls a real tool via the MCP client and checks the response. A scheduled cloud function is suitable:

import os, time, requests
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
import asyncio
async def probe():
params = StdioServerParameters(
command="npx", args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
)
async with stdio_client(params) as (read, write):
async with ClientSession(read, write) as s:
await s.initialize()
tools = await s.list_tools()
assert any(t.name == "read_file" for t in tools.tools)
await s.call_tool("read_file", {"path": "/etc/hostname"})
def handler(event, context):
t0 = time.time()
try:
asyncio.run(asyncio.wait_for(probe(), timeout=15))
except Exception as e:
notify("❌ MCP probe failed", f"{type(e).__name__}: {e}", 9)
return {"statusCode": 200}
ms = int((time.time() - t0) * 1000)
if ms > 5000:
notify("⏱️ MCP медленный", f"Probe занял {ms} мс", 7)
return {"statusCode": 200}

Similarly — for HTTP MCP use mcp.client.http or a direct POST.

Third-party MCP servers (@modelcontextprotocol/server-*, community implementations) also have releases and can break. Monitor versions with a simple ping script:

Окно терминала
prev=$(cat /tmp/mcp-versions || true)
cur=$(npm view @modelcontextprotocol/server-filesystem version)
if [[ "$prev" != "$cur" ]]; then
curl -fsS "$NOTIFLY_URL/message?token=$NOTIFLY_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"title\":\"🆕 MCP server-filesystem $cur\",\"message\":\"Прошлая версия: $prev\",\"priority\":4}"
fi
echo "$cur" > /tmp/mcp-versions
  • which MCP specifically is not responding (the name from your mcp.json);
  • which tools it exported last time and which it exports now;
  • how many active IDE sessions rely on this server (if known);
  • a link to logs / dashboard.