Find and Remove Large Files That Are Open but Have Been Deleted

1. Overview

When we run out of disk space, we often turn to our Linux machine’s filesystem, find some files we no longer need, and remove them. Unfortunately, removing files may not work when there are running processes that hold references to these files.

In this case, we can stop each process that keeps an open reference and then perform the removal. However, this activity could pose a problem as well. For example, some applications shouldn’t be stopped during business hours.

In this tutorial, we’ll explore solutions to this problem.

2. Problem Reproduction

Let’s start by creating a very simple example process that never stops writing to a file in the background:

$ while true; do echo -n 'Hello World '; done > output.log &

The above script keeps appending the phrase “Hello World ” to the output.log file forever. As a reminder, we can bring a running process to the foreground by typing the fg command and pressing Ctrl+c to stop it or using the kill command to kill the process. Now let’s remove the file:

$ rm output.log

Note that the rm command seems to have been executed successfully. We can even verify this by typing the command echo $?, which prints a successful exit status of 0. Moreover, we can no longer find the file in its directory.

3. Check if the File Is Deleted

Actually, the file’s contents weren’t deleted. The df command, in combination with watch, enables us to verify this:

$ watch df -h /
...

The output above will show that disk usage has not decreased and keeps increasing. In fact, the rm command only deletes a reference to the file. As long as a process holds a reference to the file, it prevents the operating system from freeing its contents.

3.1. Check a Process’s Open Files in the /proc Directory

To verify that the background process still holds the file descriptor of the file open, we can refer to the /proc directory. To do so, we first have to find the process ID of our background process. Besides the ps command, we can also do this easily with the jobs command:

$ jobs -l
[1]+ 8199 Running                 while true
do
    echo -n 'Hello World '; 
done > output.log &

Here, our background process ID is 8199. Next, we can refer to the relevant subdirectory of /proc:

$ ls -all /proc/8199/fd
total 0
dr-x------ 2 ubuntu ubuntu 0 Aug 22 16:54 .
dr-xr-xr-x 9 ubuntu ubuntu 0 Aug 22 16:47 ..
lrwx------ 1 ubuntu ubuntu 64 Aug 22 16:54 0 -> /dev/pts/0
l-wx------ 1 ubuntu ubuntu 64 Aug 22 16:54 1 -> '/home/ubuntu/output.log (deleted)'
lrwx------ 1 ubuntu ubuntu 64 Aug 22 16:54 2 -> /dev/pts/0
lrwx------ 1 ubuntu ubuntu 64 Aug 22 16:54 255 -> /dev/pts/0

Note the (deleted) label next to the output.log file. As can be seen, output.log still exists in the open files of our background process.

3.2. Check a Process’s Open Files With the lsof Command

Another way to find open files in a system, even if we’ve deleted them, is the lsof command:

$ sudo lsof +L1
COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NLINK   NODE NAME
bash    8199 ubuntu    1w   REG    8,1     2328     0 258076 /home/ubuntu/output.log (deleted)

Here, we used the +L1 option to print files with less than one link. Alternatively, we may set the p option to filter the list by process ID:

$ sudo lsof -p  8199
COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
...
bash    8199 ubuntu    1w   REG    8,1    10284 258076 /home/ubuntu/output.log (deleted)
...

Again, we have verified that the deleted file is still open.

4. File Truncation

When we truncate a file, we reduce or empty its contents without deleting the file. We can truncate a file even if the file is open. Consequently, file truncation is a good solution to our problem.

In all cases, we can verify the result of the file truncation with the df command by taking note of the storage used before and after.

4.1. File Truncation With Redirection

When we redirect the output of a command with the redirection operator > to a file, the shell opens the file for writing. For an existing file, the redirection operation will truncate the file. We can use this property of redirection.

Since we deleted the file reference, we can find it at the /proc/${pid}/fd directory and perform the redirection:

$ df -h /; : > /proc/8199/fd/1; df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       4.7G  4.4G  300M  94% /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       4.7G  4.0G  759M  85% /

Before the redirection operator, we wrote the colon character :, which is a shell built-in that stands for the no operation command. Its empty output is redirected to the file defined after the redirection operator. Lastly, we confirm our efforts by using df before and after the operation.

4.2. Other Forms of File Truncation With Redirection

Instead of the colon character, we can use any command that produces an empty output:

$ cat /dev/null > /proc/8199/fd/1

Indeed, the cat /dev/null command will successfully truncate the file to zero:

$ echo -n > /proc/8199/fd/1

Last but not least, the echo -n command produces an empty output, thus having the same effect as the previous examples.

4.3. File Truncation With the truncate Command

We can achieve the same result as in the previous section with the truncate command. The truncate command increases or reduces the size of a file:

$ truncate -s 0 /proc/8199/fd/1

As can be seen, we’ve used the s option to shrink the file size to zero. As before, since we deleted output.log, we’re using the /proc folder.

4.4. File Truncation With the dd Command

The dd command, which copies files, can be used to truncate a file. To achieve this, we set as input file the null device /dev/null:

$ dd if=/dev/null of=/proc/8199/fd/1

We set the origin file in the if option and the destination in the of option. As before, we set the destination to the reference of the file that our background process holds.

4.5. File Truncation With the tee Command

The tee command copies standard input to one or more files and the standard output. We can use it to truncate the deleted file with some tweaking:

$ cat /dev/null | tee /proc/8199/fd/1

Here we send the null device contents to the tee command, with the help of the pipe operator |. Note that instead of cat /dev/null, we can use any other command that produces an empty output.

5. File Truncation With the gdb Command

A nice feature of the GNU Debugger is that we can evaluate a C or C++ language expression in the context of a running process. In our case, we can attach the debugger to the running process and evaluate a C function that will truncate an open file.

Such a function of the C language is the ftruncate function. Importantly, the ftruncate function requires a file descriptor. We can easily find it in the /proc folder as we did before:

$ ls -all /proc/8199/fd
...
l-wx------ 1 ubuntu ubuntu 64 Aug 27 15:24 1 -> '/home/ubuntu/output.log (deleted)'
...

The file descriptor that we are seeking is number 1. Next, we attach the debugger to the running process. Recall that the PID is 8199:

$ sudo gdb -p 8199
...
(gdb) call ftruncate(1,0)
$1 = 0
(gdb) q

The gdb tool is interactive, so we start it by setting the p option to the process ID we want to attach to. Next, we use the call subcommand to execute the ftruncate function.

Here, the ftruncate function takes two arguments:

File descriptor (1)
Size of the file after the operation (0)

Finally, we press the q key to exit the debugger and verify that the file was indeed truncated.

6. The logrotate Command

The logrotate command is usually the best tool for handling large log files. The purpose of logrotate is to back up and truncate those large log files while our background process is still running.

Since it provides many configuration options, we’ll demonstrate a very simple example. In contrast to the previous sections. In the following subsections, we assume that the log file isn’t deleted.

6.1. The HUP Signal Handling

We’ll modify our script so that it handles the HUP signal. Upon receiving this signal, the process should close the handle of the output.log file and then reopen it:

#! /bin/bash
exec 3> output.log;
trap 'echo SIGHUP RECEIVED;exec 3>&-;exec 3>output.log' SIGHUP;
while true
        do echo -n 'Hello World ' >&3;
done

We perform the HUP signal handling with the trap command. We assign the output.log file descriptor to 3. Upon receiving the HUP signal, we first print a message, and then we close and reopen the file descriptor with the exec 3>&- and exec 3>output.log commands.

Let’s save the above script to file testscript.sh and execute it:

$ ./testscript.sh &
[1] 47978

This way, we’ve ensured the handling of HUP.

6.2. The logrotate Configuration

Next, we create a very simple logrotate configuration and save it to logrotate.conf:

/home/ubuntu/output.log {
    copytruncate
    postrotate
        kill -HUP 47978
    endscript
}

In the above configuration, we instruct logrotate to rename our log file by appending a number (i.e., output.log.1) and then send a HUP signal to our background process. As a result, the background process will create a new output.log file where it will continue writing while existing logs are backed up to the output.log.1 file, which can be safely deleted.

6.3. The logrotate Execution

We are ready to fire logrotate:

$ sudo logrotate -f logrotate.conf
SIGHUP RECEIVED

Note the use of the -f option to force the logrotate execution. Effectively, after the message of the HUP signal is printed, we can see that the output.log file is truncated and an output.log.1 file is created.

7. Conclusion

In this article, we examined ways to truncate a file when:

we’ve removed the file
a running process holds a reference to it
the operating system can’t free the file’s resources

In the end, we examined the logrotate command, which is the usually best way to handle log files and avoid such situations.

Full Archive

About Baeldung

Administration

Filesystems

Processes

Files

Scripting

Installation

Networking

Security