Skip to content

Notifications about failing systemd services

In modern Linux distributions everything runs under systemd. If nginx, postgres, your backend, or the container runtime suddenly goes down — you need to know right away, not when users start calling.

The neatest solution is to integrate sending notifications directly into systemd via OnFailure= — without additional watchers or cron jobs.

Systemd can automatically start another unit when the main service transitions to failed. It’s enough to create a “universal” unit notifier once and attach it to all necessary services via a drop-in.

/etc/systemd/system/notifly@.service:

[Unit]
Description=Notify about failed unit %i
[Service]
Type=oneshot
EnvironmentFile=/etc/notifly.env
ExecStart=/bin/bash -c '\
STATUS=$(systemctl status %i --no-pager -n 20 | tail -n 20 | sed "s/\"/\\\\\\\"/g"); \
/usr/local/bin/notifly-send \
"🛑 Сервис %i упал на $(hostname -s)" \
"$STATUS" 9'

Reload systemd:

Окно терминала
sudo systemctl daemon-reload

Test sending manually:

Окно терминала
sudo systemctl start notifly@nginx.service

No need to edit the original unit files. Use a drop-in:

Окно терминала
sudo systemctl edit nginx.service

In the editor that opens, add:

[Unit]
OnFailure=notifly@%n.service

Save, then:

Окно терминала
sudo systemctl daemon-reload

Similarly, attach to all critical services:

Окно терминала
for s in nginx postgresql redis docker your-backend; do
sudo mkdir -p "/etc/systemd/system/${s}.service.d"
cat <<EOF | sudo tee "/etc/systemd/system/${s}.service.d/notifly.conf" >/dev/null
[Unit]
OnFailure=notifly@%n.service
EOF
done
sudo systemctl daemon-reload

Simulate a failure:

Окно терминала
# Break the nginx config and try restarting
sudo nginx -t || true # check the real config
sudo systemd-run --unit=test-fail.service /bin/false

Within a few seconds a message will arrive in Notifly containing the title and the last 20 lines of the failed service’s status.

Additionally: notification for automatic restarts

Section titled “Additionally: notification for automatic restarts”

If a service has Restart=on-failure enabled, systemd may silently restart it. To see that as well, add OnFailure= and raise the counter:

[Service]
Restart=on-failure
RestartSec=5s
StartLimitBurst=3
StartLimitIntervalSec=120
[Unit]
OnFailure=notifly@%n.service
StartLimitAction=none

Then the notification will arrive after the third failure within 2 minutes — that is, when automatic recovery fails.

Windows: PowerShell + Service Control Manager

Section titled “Windows: PowerShell + Service Control Manager”

An equivalent for Windows servers: subscribe to Service Control Manager events in the System Event Log (Event ID 7031, 7034 — service failed/restarted). Uses Windows Event Trigger in Task Scheduler.

C:\scripts\Notifly-Service-Failed.ps1
param([string]$ServiceName, [string]$EventId)
. C:\scripts\Notifly.ps1
$svc = Get-Service -Name $ServiceName -ErrorAction SilentlyContinue
$status = if ($svc) { $svc.Status } else { "Unknown" }
# Last 20 entries from the System log for this service
$logs = Get-WinEvent -FilterHashtable @{LogName='System'; ProviderName='Service Control Manager'} `
-MaxEvents 20 -ErrorAction SilentlyContinue |
Where-Object { $_.Message -match $ServiceName } |
Select-Object -First 5 |
ForEach-Object { "$($_.TimeCreated.ToString('HH:mm:ss')) $($_.Message)" } |
Out-String
Send-Notifly `
-Title "🛑 Сервис $ServiceName упал на $env:COMPUTERNAME" `
-Message "Статус: $status`nEvent ID: $EventId`n`n$logs" `
-Priority 9

Subscribing to events for critical services

Section titled “Subscribing to events for critical services”
C:\scripts\Register-NotiflyServiceWatch.ps1
$services = @("Spooler", "W3SVC", "MSSQLSERVER", "nginx")
foreach ($s in $services) {
$xml = @"
<QueryList>
<Query Id="0" Path="System">
<Select Path="System">
*[System[Provider[@Name='Service Control Manager'] and (EventID=7031 or EventID=7034)]]
and *[EventData[Data='$s']]
</Select>
</Query>
</QueryList>
"@
$Trigger = New-ScheduledTaskTrigger -AtStartup
$Trigger.Subscription = $xml
$Action = New-ScheduledTaskAction -Execute "powershell.exe" `
-Argument "-NoProfile -ExecutionPolicy Bypass -File C:\scripts\Notifly-Service-Failed.ps1 -ServiceName $s -EventId 7031"
Register-ScheduledTask -TaskName "Notifly Watch $s" -Trigger $Trigger -Action $Action `
-Principal (New-ScheduledTaskPrincipal -UserId "SYSTEM" -LogonType ServiceAccount -RunLevel Highest) `
-Force
}

If subscribing to events doesn’t work for some reason — poll the services every minute:

C:\scripts\Notifly-Service-Poll.ps1
. C:\scripts\Notifly.ps1
$watch = @("Spooler", "W3SVC", "MSSQLSERVER")
$state = "C:\ProgramData\Notifly\service-state.json"
$prev = if (Test-Path $state) { Get-Content $state | ConvertFrom-Json } else { @{} }
$now = @{}
foreach ($s in $watch) {
$svc = Get-Service -Name $s -ErrorAction SilentlyContinue
$now[$s] = if ($svc) { $svc.Status.ToString() } else { "Missing" }
if ($prev.$s -and $prev.$s -ne $now[$s] -and $now[$s] -ne "Running") {
Send-Notifly -Title "🛑 $s: $($now[$s]) на $env:COMPUTERNAME" `
-Message "Было: $($prev.$s)" -Priority 9
}
}
$now | ConvertTo-Json | Set-Content $state
  • Zero false alerts: the message arrives exactly when the unit actually went to failed.
  • No agents: systemd is already installed everywhere — no need for Zabbix, Prometheus, or Datadog just for a simple up/down check.
  • Context in your pocket: the message contains 20 log lines, often enough to understand the cause directly from your phone.
  • Add a “Restart” link — a separate action in Notifly with a deep link to the internal admin portal.
  • Implement different priority levels: for secondary cron units priority=4, for prod databases — priority=10.