Skip to content

Heartbeat for cron, backups and daemons

The most common pain of a system administrator: cron “sort of works”, but the last backup in /var/backups is three weeks old, and you find out about it exactly when you need to restore the database.

Instead of an “active” check with dashboards and alerts in Prometheus you can do a passive check using Heartbeat: cron simply sends a short ping to Notifly on every successful run, and if the next ping doesn’t arrive in time — Notifly will send a notification.

Step 1. Create a heartbeat in the admin panel

Section titled “Step 1. Create a heartbeat in the admin panel”

app.notifly.ru → section HeartbeatsCreate.

Settings for a typical hourly backup:

ПолеЗначение
НазваниеCron бэкапа PostgreSQL
Каналinfra (любой, через который шлёте алёрты)
Интервал (сек)3600 (раз в час)
Допуск (сек)300 (5 минут запас)
Текст alertБэкап PostgreSQL не запустился — проверьте сервер!
Приоритет alert9 — loud popup notification
Текст recoveryБэкап снова работает.

Copy the Ping URL from the table — it looks like https://your-notifly/heartbeat/ping/H....

/etc/cron.d/pg-backup:

NOTIFLY_PING="https://your-notifly/heartbeat/ping/H..."
0 * * * * postgres /usr/local/bin/pg-backup.sh \
&& curl -fsS "$NOTIFLY_PING" -o /dev/null

The key idea: the ping is invoked via &&, so only on script success. If the backup returns a non-zero code — the ping won’t be sent, and after an hour+5 minutes an alert “Backup did not run” will arrive.

If you moved from cron to a systemd timer, a drop-in is done without editing the original:

Окно терминала
sudo systemctl edit pg-backup.service
[Service]
ExecStartPost=/usr/bin/curl -fsS https://your-notifly/heartbeat/ping/H... -o /dev/null

ExecStartPost runs only if the main ExecStart completed successfully.

Simulate a “failed” backup — stub out pg-backup.sh so that it immediately exits with an error:

Окно терминала
sudo systemctl edit --runtime --force fake-broken-backup.service <<EOF
[Service]
Type=oneshot
ExecStart=/bin/false
ExecStartPost=/usr/bin/curl -fsS https://your-notifly/heartbeat/ping/H... -o /dev/null
EOF

Start it — the ping won’t go out, and after the interval+ tolerance seconds a push notification will arrive.

  • Backups of any databases and filesystems — can be invaluable during a panic.
  • Certificates and Let’s Encrypt cron renewals — heartbeat once a day, alert “certbot didn’t run”.
  • Log rotation — once a week.
  • Imports/exports between systems — heartbeat hourly/daily/weekly.
  • IoT sensors — the device “calls home” every 5 minutes via curl, a network outage will immediately trigger an alert.

If you know the server will undergo maintenance and you don’t want to receive false alerts:

  • In the admin panel: the ⏸ icon in the heartbeat row;
  • via API: POST /heartbeat/<id>/pause, then …/resume;
  • via MCP: ask the assistant “pause backup heartbeat for an hour”.

When you return — resume, and the check will resume from the new time.

Why this is more reliable than “on-error” alerts

Section titled “Why this is more reliable than “on-error” alerts”

Active alerts (“something broke”) stay silent if cron didn’t run at all, if the crontab disappeared, or if the server was shut down. A heartbeat check stays silent only when everything actually works: you have the server, cron, the network, and the script completed successfully. Any hole in that chain — and within a minute you’ll receive a notification.