1. Overview

Restarting a running process involves shutting down that process by sending a termination signal. Processes usually perform some cleaning before exiting, which may take some time. Finally, we again start the program or service after its former process terminates.

In this tutorial, we’ll examine different ways to restart a running process, thus ensuring that it’s always up and running.

2. Creating a Sample Program

To begin with, let’s create a short shell script that we’ll use for testing purposes:

$ cat aprogram.sh
#!/bin/bash

trap "echo 'the program is exiting';sleep 3;exit 0" SIGINT;
echo "the program is running...";
while true; do echo '' > /dev/null; done;

Here, we use the trap command in the first line to define how to handle the SIGINT signal. Upon receiving SIGINT, the script prints a message on the screen, sleeps for 3 seconds, and exits with a 0 status.

Next, the script prints a message when it starts. Finally, in the last line, the script enters an infinite while loop that simulates the execution of a program.

3. Using Bash while or until Loops

We can ensure that a program keeps running using a loop command like while or until.

Let’s see an example loop that restarts our program when it exits:

$ while ./aprogram.sh ; do : ; done
the program is running...
^Cthe program is exiting
the program is running...

Indeed, we can see that, when we send the SIGINT signal by pressing the Ctrl + c keys, the program exits and starts over again.

The while loop calls aprogram.sh as part of its condition evaluation. In addition, the while loop waits until aprogram.sh finishes, thereby not proceeding to the execution of its body. If the script returns an exit status of 0, the while loop continues to the next iteration.

Also, since we don’t want to execute anything in the while loop’s body but can’t leave it empty, we just used the null command (:) as the body.

As usual, instead of a while loop, we can use the until loop command. In contrast to while, until keeps iterating as long as the condition command returns an exit status that isn’t 0. As a result, since aprogram.sh returns a 0 exit status upon receiving the SIGINT signal, we’ll have to add the negation operator (!) so that the until loop continues its execution:

$ until ! ./aprogram.sh  ; do : ; done
the program is running...
^Cthe program is exiting
the program is running...

As expected, the program restarted when we sent the SIGINT signal.

4. Using the systemd Service Manager

Another option for ensuring that a process restarts is to create a systemd service and use the systemctl command to control it. Such services are usually called watchdogs, since they monitor and watch over a given process.

4.1. The Service Unit File

First, we create a systemd service unit file and save it to the /etc/systemd/system directory:

$ cat /etc/systemd/system/aprogram.service
[Unit]
Description=Service to run aprogram.sh

[Service]
ExecStart=/usr/local/bin/aprogram.sh
Restart=on-success
StandardOutput=append:/var/log/aprogram.log

In the above example, we defined a system service using the bare minimum properties:

  • ExecStart: path to the program
  • Restart: whether the service manager restarts the program when it exits
  • StandardOutput: the path to the log file that the program prints its output

The Restart property is where we set the restart policy. We can set different values here:

  1. no: process is never restarted
  2. on-success: process is restarted when it exits cleanly
  3. on-failure: process is restarted when it fails
  4. on-watchdog: when watchdog monitoring is enabled, restart the process when the watchdog timeout expires
  5. always: process is always restarted

Notably, we consider that there’s a clean exit in one of three cases:

  1. process returns a 0 exit status code
  2. process receives a SIGHUP, SIGINT, SIGTERM, or SIGPIPE signal
  3. process receives one of the signals defined in the SuccessExitStatus property

In our example, we simulate a clean exit. As a result, we’re using the on-success policy.

Finally, it’s also common to use the on-failure policy to handle failures and ensure that a process is always running.

4.2. Running the Service

Before starting the service, we should call the systemctl daemon-reload command to reload all unit files:

$ sudo systemctl daemon-reload

Now, we’re ready to start our service using the systemctl command:

$ sudo systemctl start aprogram
$ sudo systemctl status aprogram
● aprogram.service - Service to run aprogram.sh
     Loaded: loaded (/etc/systemd/system/aprogram.service; static)
     Active: active (running) since Sat 2023-12-23 12:13:08 EET; 1s ago
   Main PID: 1029579 (aprogram.sh)
      Tasks: 1 (limit: 1024)
     Memory: 444.0K
        CPU: 659ms
     CGroup: /system.slice/aprogram.service
             └─1029579 /bin/bash /usr/local/bin/aprogram.sh

Indeed, the service has started.

4.3. Sending the SIGINT Signal

Next, let’s send a SIGINT signal using the pkill command and see if the service manager restarts the program:

$ sudo pkill -2 aprogram
$ cat /var/log/aprogram.log
the program is running...
the program is exiting
the program is running...

Indeed, we can see in the log that the program received a SIGINT signal, exited, and was restarted by the service manager.

Furthermore, let’s again check the status of the service:

$ sudo systemctl status aprogram
● aprogram.service - Service to run aprogram.sh
     Loaded: loaded (/etc/systemd/system/aprogram.service; static)
     Active: active (running) since Sat 2023-12-23 12:21:39 EET; 2min 24s ago
   Main PID: 1029623 (aprogram.sh)
...

Indeed, we can see that our service is still running. Notably, the main PID number is 1029623. Considering the command reported PID 1029579 when starting the service, it’s evident that the service manager started another process when the initial one terminated.

5. Using Docker

Interestingly, Docker has a mechanism to ensure the continuous execution of a program.

5.1. Example Container

To keep things simple, we’ll use the docker run command to create a new process that sleeps for a certain amount of time:

$ sudo docker run -d --rm --name aprocess ubuntu:latest sleep 10000
e2aca63336be97c7c1ea77e9cb82b09a2d657ad86fadf9bee8f10f99995642c3

Here, we started a new container using the latest Ubuntu image. Furthermore, the container ran the sleep command and fell asleep for 10000 seconds. Let’s explore the options we use:

  • -d: the container detaches from the current terminal session
  • –rm: the container will be removed after the end of its execution
  • –name: the name of the container, here it’s aprocesss

Following this, let’s view the container’s process PID using the pgrep command:

$ pgrep -a sleep
7575 sleep 10000

Indeed, there’s a process with PID 7575 that runs the sleep command.

Next, let’s kill the process:

$ sudo kill -9 7575
$ pgrep -a sleep
$ sudo docker container inspect aprocess
[]
Error response from daemon: No such container: aprocess

As we expected, the process was terminated successfully. To verify this, we ran the pgrep command, which reported no processes running the sleep command anymore. Next, we ran the docker container inspect command, which failed to find the aprocess container.

5.2. The –restart Option

The –restart option enables the Docker daemon to restart a container if it stops running. Let’s add this option to the command that creates our container:

$ sudo docker run -d --restart always --name aprocess ubuntu:latest sleep 10000
071362dac05bb39af81b0d0a83b08efffadf090563ceedbc8363b741d6b0373b
$ pgrep -a sleep
8340 sleep 10000

As can be seen, a new container was created successfully. Also, we can see that we removed the –rm option since it’s incompatible with the –restart option.

Next, let’s kill the process with PID 8340:

$ sudo kill -9 8340
$ pgrep -a sleep
8460 sleep 10000

Indeed, we terminated process 8340. Nevertheless, in contrast to the previous example, pgrep found another process with PID 8460, running the sleep command.

Furthermore, we can verify our result with the docker container ls command:

$ sudo docker container ls -a | grep aprocess
bd280dc21418   ubuntu:latest         "sleep 10000"           About a minute ago   Up 2 seconds     aprocess

As we expected, the aprocess container is still running. In addition, the notably short uptime of 2 seconds suggests that the Docker daemon restarted the container.

6. Conclusion

In this article, we examined three methods to restart a process when it exits:

  1. using the while or until loop command of Bash
  2. using the systemd service manager
  3. using Docker

In conclusion, based on our specific case, we can select the most appropriate method for restarting a process.

1 Comment
Oldest
Newest
Inline Feedbacks
View all comments
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.