1. Introduction

systemd is a system and service manager for Linux operating systems. It’s responsible for managing and maintaining system processes, services, and daemons. Sometimes, these services can fail, and it’s important that we get notified of such an event so that we can address the issue quickly.

In this tutorial, we’ll learn how to use the systemd OnFailure feature to trigger notifications and how to configure notification channels over Slack and email.

2. systemd Unit Files

Before we begin, let’s look at what systemd unit files are, their purpose, and their format.

A unit file is a plain text, ini-style file that encodes information about a service or any other entity controlled and supervised by systemd.

Let’s look at a typical unit file:

[Unit]
Description=The NGINX HTTP and reverse proxy server
After=syslog.target network.target remote-fs.target nss-lookup.target

[Service]
Type=forking
PIDFile=/run/nginx.pid
ExecStartPre=/usr/sbin/nginx -t
ExecStart=/usr/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Typically, a unit file consists of three sections. The common configuration items are in the [Unit] and [Install] sections, while the service-specific configurations are in the [Service] section.

We can view the complete list of systemd section options by running the commands man systemd.unit and man systemd.service.

2.1. systemd OnFailure

The [Unit] section accepts an OnFailure option. This is a space-separated list of one or more units that are activated when this unit enters the “failed” state.

We should note that we can’t use OnFailure to execute a command directly. So, what we’ll do in the next section is create a unit that, when started, sends the notification, and then we’ll use this new unit as part of the OnFailure definition for a service that we’d like to be notified of when it enters a failed state.

3. Creating a Notify Service

Let’s start by creating a sample notification service. This service doesn’t do anything useful, but it will serve to explain some important concepts.

We can create a systemd unit file to describe our notification service by creating a file in one of the locations where systemd expects to find them. Typically, unit files created by an administrator are placed under the /usr/local/lib/systemd/system directory.

Let’s create our pseudo-notify service:

$ sudo cat - <<EOF > /usr/local/lib/systemd/system/[email protected]
[Unit]
Description=Send Pseudo Notification

[Service]
Type=oneshot
ExecStart=echo 'Notification triggered for service %i'

[Install]
WantedBy=multi-user.target
EOF

Let’s look at a few important concepts here:

  • The service file name [email protected] ends with an @ character right before the type suffix. This tells systemd this is a template service, which means it serves as the definition of multiple services. We want this for our notify service because, in case multiple services fail at the same time, we want systemd to fire multiple instances of the notify service.
  • The Type=oneshot under the [Service] section tells systemd that this service should fire up and transition from “activating” to “deactivating” directly since this process is not supposed to run continuously.
  • The ExecStart option is where we place the command that will send the actual notification.
  • %i is a specifier that will be replaced with the failing service name. We can view all available specifiers at the systemd.unit man page.

4. Send Notifications to Slack

Now that we understand the basics of how to use OnFailure to trigger notifications when a systemd service fails, we’re ready to create our first notification.

To start, let’s create a script that utilizes the Slack API to post messages to a channel:

$ sudo cat - <<EOF > /usr/local/bin/slackNotify.sh
#!/bin/bash
# Bash script to send systemd notifications to Slack
# Edit the following variables to match your requirements
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/XXXXXXXXX/XXXXXXXXX/XXXXXXXXXXXXXXXXXXXXXXXX"
SLACK_CHANNEL="#general"
SLACK_USERNAME="Notification Bot"
SLACK_ICON=":zap:"
SLACK_COLOR="danger"
SLACK_TITLE="Service $SERVICE_NAME failed on $(hostname)"
SLACK_PRETEXT="Service $SERVICE_NAME failed"
SLACK_TEXT="$(systemctl status $SERVICE_NAME)"
SLACK_FOOTER="Notification Bot at $(hostname) on $(date)"
# End of variables

function usage {
    programName=$0
    echo "description: use this script to post systemd service failure message to Slack channel"
    echo "usage: $programName -s \"service name\""
    echo "	-s    the systemd service name e.g. nginx"
    exit 1
}

# Get service name from options
while getopts ":s:" opt; do
  case $opt in
    s)
      SERVICE_NAME=$OPTARG
      ;;
    \?)
      echo "Invalid option: -$OPTARG" >&2
      exit 1
      ;;
    :)
      echo "Option -$OPTARG requires an argument." >&2
      exit 1
      ;;
  esac
done

if [[ ! "${SERVICE_NAME}" ]]; then
    echo "Service name is required"
    usage
fi

SLACK_ATTACHMENT='[{"fallback": "'"$SLACK_MESSAGE"'", "color": "'"$SLACK_COLOR"'", "title": "'"$SLACK_TITLE"'", "title_link": "'"$SLACK_TITLE_LINK"'", "pretext": "'"$SLACK_PRETEXT"'", "text": "'"$SLACK_TEXT"'", "footer": "'"$SLACK_FOOTER"'", "footer_icon": "'"$SLACK_FOOTER_ICON"'"}]'

# Send notification to Slack
curl -X POST --data-urlencode 'payload={"channel": "'"$SLACK_CHANNEL"'", "username": "'"$SLACK_USERNAME"'", "icon_emoji": "'"$SLACK_ICON"'", "attachments": '"$SLACK_ATTACHMENT"'}' $SLACK_WEBHOOK_URL

# Exit with success code
exit 0
EOF

We should replace the SLACK_WEBHOOK _URL with a valid URL created under our Slack account. We can find more information about how to create a webhook URL in the Slack API Documentation.

Next, let’s make the script executable using chmod:

$ sudo chmod +x /usr/local/bin/slackNotify.sh

Then, let’s create the “notify-slack” service:

$ sudo cat - <<EOF > /usr/local/lib/systemd/system/[email protected]
[Unit]
Description=Send Systemd Notifications to Slack

[Service]
Type=oneshot
ExecStart=/usr/local/bin/slackNotify.sh -s %i

[Install]
WantedBy=multi-user.tar
EOF

Now, we’re ready to add this notification service to any systemd service. To do that, we just add OnFailure=notify-slack@%i.service to the service we’d like to monitor under the [Unit] section, for example:

[Unit]
Description=The NGINX HTTP and reverse proxy server
After=syslog.target network.target remote-fs.target nss-lookup.target
OnFailure=notify-slack@%i.service

[Service]
Type=forking
PIDFile=/run/nginx.pid
ExecStartPre=/usr/sbin/nginx -t
ExecStart=/usr/sbin/nginx
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Before our changes become effective, we need to inform systemd of the changes:

$ sudo systemctl daemon-reload

5. Send Notifications to Email

Another common channel to receive notifications is via email. But first, we need to ensure our system is able to send reliable emails with authentication.

After we’ve installed and configured msmtp, let’s build a simple email notification service:

$ sudo cat - <<EOF > /usr/local/lib/systemd/system/[email protected]
[Unit]
Description=Send Systemd Notifications to Email

[Service]
Type=oneshot
ExecStart=/usr/bin/bash -c 'echo "Subject: Service Failed\n\nService %i failed on $(hostname)\n$(systemctl status %i)" | /bin/msmtp [email protected]'

[Install]
WantedBy=multi-user.target
EOF

We must replace [email protected] with the email address where we’d like to receive notifications.

Now, we’re ready to add this notification service to any systemd service. To do that, we just add OnFailure=notify-email@%i.service to the service we’d like to monitor under the [Unit] section as we saw above.

6. Conclusion

In this article, we learned what systemd OnFailure is and how to get notified when a systemd service enters a “failed” state using the systemd unit OnFailure option.

Additionally, we explored two common notification channels and how to create simple services to receive notifications over Slack and email.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.