Authors Top

If you have a few years of experience in the Linux ecosystem, and you’re interested in sharing that experience with the community, have a look at our Contribution Guidelines.

1. Introduction

Since Linux processes depend on the kernel, sometimes the sure-fire way of killing a process seemingly has no effect.

In this tutorial, we look at reasons kill -9 can leave a target process intact. First, we discuss why a process might need to be killed. Next, we go into reasons the standard kill command could appear to have no effect. Finally, we explore workarounds for such situations.

We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments.

2. Killing in the Name

Processes can become unresponsive or obsolete due to many reasons. From a simple hang, through resource hogging and time wasting, to backgrounding, we might need solutions to terminate such processes.

Naturally, we already have some, depending on the situation:

Still, in Linux, all drastic solutions rely on the kill() system call with the most fatal SIGKILL (9) signal applied directly to the target process ID (PID):

# kill -SIGKILL 666
# kill -9 166

However, kill -9 might still not do the job even when run as a superuser. Let’s see why.

3. When kill -SIGKILL Appears to Fail

To understand when processes don’t seem to react to a signal they can’t ignore, we should first understand two particular process states:

Let’s look at both.

3.1. Uninterruptible Sleep

Basically, sleeping states usually mean a process is waiting on resources. In particular, a process in uninterruptible sleep reacts only to awaited resources becoming available but ignores all signals.

For example, bad drivers or hardware or something like waiting on a remote filesystem can cause blockages:

$ umount /mnt/smb
[...]

Consequently, uninterruptible processes may block indefinitely due to race conditions or other resource issues.

Meanwhile, they don’t react to any signal, including SIGKILL.

3.2. Zombie

Another potentially problematic process type is the zombie. In a zombie state, terminated or completed processes remain in the system because their parent hasn’t issued a wait() to acknowledge the child’s end as signaled to it via the SIGCHLD signal:

$ (sleep 1 & exec /bin/sleep 11) &
$ ps
  100 pts/1    00:00:00 bash
  660 pts/1    00:00:00   sleep
  666 pts/1    00:00:00     sleep 

Here, we spawn a subshell that runs sleep in the background and follows it with an exec sleep. The latter prevents the acknowledgment of the former’s death, thus creating a zombie process for 10 (111) seconds, as confirmed by its status in the output of ps.

In essence, the behavior of zombies is similar to that of a process in uninterruptible sleep, but they can be considered to wait for the parent instead of a resource. Hence, not even SIGKILL can change the state of a zombie process. Of course, the states aren’t equivalent since zombies are actually terminated processes.

Importantly, kill commands that target zombie, or uninterruptible sleep processes will succeed, but the signals will never be received.

4. What We Can Do About It

After knowing the two main reasons a process can be unresponsive to kill -SIGKILL, we can see explore options.

4.1. Work Around Uninterruptible Processes

While no direct influence is possible over them, we still have methods to prevent and terminate an uninterruptible process:

  • avoid their creation by maintaining hardware, drivers, networking, and the system in general
  • make any awaited resources available
  • in some cases, killing the parent can lead to an uninterruptible child terminating
  • reboot

For example, let’s use ps to show a process in uninterruptible sleep due to network connectivity issues:

$ ps 666
PID  TTY  STAT  TIME COMMAND
666  ?    D     0:00 [cifsd]

In this example, after experiencing problems with CIFS (SMB), we verify the CIFS client daemon cifsd is in the uninterruptible sleep state. As discussed, using kill -9 at this point doesn’t have an effect:

$ ps 666
PID  TTY  STAT  TIME COMMAND
666  ?    D     0:00 [cifsd]
$ kill -9 666
$ ps 666
PID  TTY  STAT  TIME COMMAND
666  ?    D     0:00 [cifsd]

So, we restore network connectivity and check again:

$ ps 666
PID  TTY  STAT  TIME COMMAND
666  ?    S     0:00 [cifsd]

Now, our process continues as usual. In cases when it doesn’t, we still have the option to reboot.

4.2. Work Around Zombie Processes

There are three standard ways to clean zombie processes:

Of course, doing the latter might affect all child processes:

$ ps -H
PID TTY          TIME CMD
 10 pts/1    00:00:00 su
100 pts/1    00:00:00   bash
660 pts/1    00:00:00     bash
666 pts/1    00:00:00       proc1
667 pts/1    00:00:00       sleep
$ kill -9 660
$ ps -H
PID TTY          TIME CMD
 10 pts/1    00:00:00 su
100 pts/1    00:00:00   bash
667 pts/1    00:00:00 sleep

Here, we see killing parent 660 terminates the child process proc1 with PID 666 but leaves the other child (PID 667) running.

Still, the SIGCHLD approach is preferable. However, simply sending the signal doesn’t force the parent to handle SIGCHLD and acknowledge a child.

Naturally, a system reboot resolves the issue as a last resort.

5. Summary

In this article, we explored why attempting to kill a process might not always succeed and what we can do about it.

In conclusion, while kill -SIGKILL is the most lethal and direct way to terminate a process, some pitfalls may need consideration.

Authors Bottom

If you have a few years of experience in the Linux ecosystem, and you’re interested in sharing that experience with the community, have a look at our Contribution Guidelines.

Comments are closed on this article!