Save and Restore a Linux Process

1. Overview

Saving a process for later use can be helpful when we’re running a long process such as complex mathematical calculations or rendering. In this tutorial, we’ll look at how we can save and restore a process that we can use across multiple reboots. While most Linux distributions support complete machine hibernation, we’d mostly be focusing on saving a specific process without affecting the others.

Throughout the tutorial, we’ll make use of different tools such as kill and criu that can help us preserve a process. Apart from that, we’ll also see how we can save our operating system’s state on a virtual machine such as VirtualBox.

2. Using the kill Command

If you’ve used Linux for some time, chances are you’ve seen this command used quite a lot. We use the kill command to send signals to running processes. These signals provide a means of inter-process communication. For instance, we can send a signal to either terminate, stop, or hang up a process from another processor directly from the terminal.

2.1. Basic Usage

The basic syntax for the kill command is pretty simple:

$ kill -[SIGNAL] [PID]

A signal is prefixed with “-“, followed by a signal code or a signal name that signifies a specific action. The signal is followed by a PID. Each process in Linux has a process ID (PID), which uniquely identifies the process on that machine.

In our case, we’re interested in pausing or stopping the process temporarily. Once we pause the process, we should be able to resume the process later to carry on its normal execution. Fortunately, the kill command provides the -STOP and the -CONT signals.

2.2. Stopping and Resuming a Process

Let’s say we want to stop a long-running process that takes up a lot of CPU time and we need to run a short process that also requires a lot of CPU usage. We can use the -STOP signal to halt the long process temporarily:

$ kill -STOP 8763

Once we run the command, we can notice that the CPU usage reduces. Although the process will still reside in the main memory, Linux will swap out the process over time to make space for other processes in the memory.

Once we’re ready to resume the process, we can send the process a -CONT signal:

$ kill -CONT 8763

We should know that once we shut down our machine, Linux will wipe out all the processes from memory, including our long-running process. So, the kill command is useful if we want to stop and resume our process temporarily without having to save it on the disk.

However, if we need to persist our processes across reboots, we might want to look at the other tools that we’ll cover next.

3. Using criu

criu is the most widely used tool for saving and restoring processes. It has the most features as compared to other tools available for the same purpose. Not only that, but it’s also up to date with the latest Linux kernel and is regularly maintained.

criu allows us to save a process or part of a process to our hard drive as a checkpoint. The checkpoint consists of files that we can later use to resume the process from the point at which it was frozen.

3.1. Installation

The criu binary package is available on most distros’ official repositories. We can use the package manager that comes with our distribution to install it:

# On Ubuntu, Debian
$ apt install criu

# On Fedora
$ yum install criu

# On Arch, Manjaro
$ pacman -S criu

If the package isn’t available in the official repository, we can refer to the criu wiki for compiling from sources.

3.2. Basic Usage

Once criu has been installed on our machine, we can go ahead and verify it:

$ criu -v 
Version: 3.16.1

Before using criu, we should know that it will only work if our Linux kernel was compiled with the CONFIG_CHECKPOINT_RESTORE option enabled. Otherwise, criu will complain and the process will not be saved. Therefore, we might need to compile the Kernel with the required options.

Now, if we need to check for this option in our Kernel’s config, we can simply read the /proc/zconfig.gz file:

$ zcat /proc/config.gz | grep CHECKPOINT
CONFIG_CHECKPOINT_RESTORE=y

Now, we’re ready to use criu to dump a process. One might be wondering what dumping is. Well, dumping a process is a way to take a snapshot of a running process at that time in memory and save the snapshot on our disk along with other information.

Let’s see how to dump a process with criu from the command line:

$ criu dump -t [PID] -D /path/to/dir [OPTIONS]

We specify the dump operation with the dump option. Then, we write the PID with the -t or –tree option. Apart from that, we also specify the directory to put the checkpoint(s) in with the -D option.

Before dumping the process, we should be aware that criu will dump the process along with its child processes. For instance, if we have spawned a process from a terminal, then criu will try to dump the whole process tree. For that reason, we should run our command-line programs from the terminal with setsid so they wouldn’t be dependent on other processes.

Now, when we want to restore our process from the file system, we can simply specify it with the checkpoint directory:

$ criu restore -D /path/to/dir

criu will first try to restore the process using the previous PID, but if that PID is already in use, the restored process will be resumed with a new PID.

3.3. Example

As an example, let’s create a custom bash script that contains an infinite loop. We’ll add 10 to a variable and print it each time the loop iterates. Let’s go ahead and write the script and then run it:

#!/bin/bash

NUM=0
while true; do
  NUM=$((NUM+10))
  echo "NUM: $NUM"
done

Let’s run the script with setsid:

$ setsid ./infinite-loop.sh

Now that our script is running, let’s check its PID with pgrep:

$ pgrep infinite-loop.sh
2805

Now, let’s use criu to save the process to our disk:

$ criu dump -t $(pgrep infinite-loop.sh) -D ~/Documents/infinite-loop

Let’s break it down:

-t specifies the PID of the process, which we got from the pgrep command through command substitution
-D option indicates the directory where the infinite-loop checkpoint files will be stored

Now, let’s kill the infinite-loop process so that we can resume it from the checkpoint files:

$ kill $(pgrep infinite-loop.sh)

Now, let’s resume the process by running the criu command with the restore option:

$ criu restore -D ~/Documents/infinite-loop
NUM: 10030560
NUM: 10030570
NUM: 10030580
.
.
.

Mind that we have to run the criu command with root privileges or we’ll face the operation not permitted error.

4. Saving State on a Virtual Machine

A virtual machine provides a useful option to save the state of our distribution that we can later resume anytime. For our example, we’ll be using VirtualBox because it’s compact and available on most platforms.

After setting up VirtualBox, let’s boot into our distro of choice. Once logged in, go ahead and press <ALT+F4> or the close button on the title bar. Once we try to close the virtual machine, we’ll be presented with a dialog. Let’s check the “Save the machine state” radio box and click “OK”:

The VM will now save the current state for our current operating system and will resume the state when we turn the operating system back on.

The VM doesn’t necessarily freeze a single process but rather the complete operating system. However, if we want to freeze a single process, then we can use criu in the operating system running in the VM. Not only that, but we can also port the frozen process to the host operating system, assuming they’re running the same operating system and the same software.

5. Conclusion

In this tutorial, we discussed how we can freeze and restore a process. We covered the kill command to temporarily pause and resume our processes. Then, we delved into using the criu tool to save and restore our processes to and from the disk.

Finally, we briefly looked at saving our operating system state on a VM.

Administration

Scripting

Networking

Files

Processes

Full Archive

About Baeldung