1. Introduction

File deletion is a common operation for any user and administrator. However, file truncation is often more common when it comes to automated activities. To truncate a file is to partially remove contents from that file to reduce its size. In practice, this often means completely clearing a file of any contents, reducing the size to 0.

In this tutorial, we explore the concept of truncation in contrast to deletion. First, we go over the general differences between the two operations over a given file. After that, we check how a file in use reacts to being deleted and truncated. Finally, we discuss special cases like the file write location and full log buffering.

While we only use complete instead of partial truncation, all methods here apply to both.

We tested the code in this tutorial on Debian 12 (Bookworm) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments unless otherwise specified.

2. Truncate Versus Delete

There are different reasons to avoid deleting a file while clearing its contents. Let’s explore some main ones.

2.1. inode Preservation

An inode stores file metadata. Further, inode numbers serve as file identifiers:

$ touch
$ ls --inode file
666 file

In this case, we first create a file via touch. After that, ls shows that file has an –inode (-i) number of 666.

Consequently, if we delete and recreate this new regular file, its inode number stops being associated with the new instance:

$ rm file
$ touch file
$ ls --inode file
667 file

Evidently, after removing and recreating file at the same location, it has a new inode number of 667.

Now, let’s see the behavior when we truncate via >, i.e., basic stream redirection:

$ > file
$ ls --inode file
667 file

As expected, the inode number remains the same.

In general, a changing inode may present challenges with applications that refer to files by their inode instead of a path and filename. In fact, this can result in stale file handles.

2.2. Speed

Due to the system calls involved, full truncation is often faster than a deletion.

Let’s create 10000 files to test this:

$ time for i in {1..10000}; do touch $i; done

real    0m13.676s
user    0m8.806s
sys     0m6.207s

Here, we use time to measure how long it takes for touch in a for loop to create ten thousand files.

Next, let’s remove and recreate all files with the same loop via rm and touch:

$ time for i in {1..10000}; do rm $i; touch $i; done

real    0m27.386s
user    0m17.518s
sys     0m12.437s

Now, we can recreate the files and perform the same measurement while only truncating:

$ time for i in {1..10000}; do > $i; done

real    0m0.167s
user    0m0.087s
sys     0m0.079s

In this case, the operation is truncation and takes almost no time, and we end up with the same result but the old inodes for each file.

Of course, that also depends on the method we use, as copying empty data by using cp to replace the file with /dev/null is considerably slower:

$ time for i in {1..10000}; do cp /dev/null $i; done

real    0m16.837s
user    0m10.888s
sys     0m7.330s

Yet, even this method is faster than deletion.

2.3. Delete Permissions

Although it’s rare to see users have privileges to truncate a file but not enough permissions to remove it, this situation is still possible.

To begin with, Linux doesn’t offer a separate delete permission by default. Thus, to prevent a user from deleting a file in most Linux environments, we can remove the write permission of the containing directory:

$ chmod -w dir/
$ rm -w dir/file
rm: cannot remove 'dir/file': Permission denied
$ > dir/file
$

In this case, we use chmod to remove the [w]rite permission from dir/, the container of file. Afterward, we verify that we can’t delete file, but we can still overwrite its contents.

While this operation affects all files within the directory, it solves the issue for the file in question as well. Thus, we can just move our files to a separate path to avoid side effects.

Another way to do the same is to use a tool like setfacl to set an ACL in the context of SELinux or a supported filesystem.

Finally, there are also situations when we do have permissions but still aren’t allowed to perform deletion or truncation. Let’s explore that.

3. File in Use

As usual, when files are in use, we might not be able to perform any operation on them. Still, we often have better chances when going for a truncation as opposed to a deletion, mainly because of the inode preservation.

So, let’s try to perform both deletion and truncation while a file is open and being populated by another process.

3.1. Usage Process

First, we open a file and begin populating it in the background:

$ { exec 3<>file; while true; do date +%S 1>&3; sleep 1; done } &
[1] 666
$

Consequently, we see our script started in the background via & as job 1 with the process identifier (PID) 666.

In this case, we used the first free file descriptor after stdin (0), stdout (1), and stderr (2): 3. In particular, we redirected file descriptor 3, linking it as both input from and output to file via exec. This way, we have a permanent file handle open for our shell session. To confirm this, we can use fuser:

$ fuser file
/file:     666 667

We see both the shell process and the current script command PID. Next, the script initiates an endless while loop with true. In it, we perpetually perform two actions:

  1. append the [%S]econds from the current date from stdout (1) to file via descriptor 3
  2. sleep for 1 second

Thus, we get lines with consecutive numbers that cycle at 59, as cat shows:

$ cat file
[...]
57
58
59
0
1

Importantly, we use this background process in our examples.

3.2. Delete File

While the background process from the previous step is running, let’s just directly remove file:

$ { exec 3<>file; while true; do date +%S 1>&3; sleep 1; done } &
[1] 666
$ cat file
2
3
4
$ rm file

Now, we wait for a second and check whether the file is there:

$ cat file
cat: file: No such file or directory

Notably, even though our job is running, file doesn’t get recreated. Actually, this is expected behavior with already open handles when deleting or moving: applications continue writing to the handle, but the data doesn’t get written to the same place in secondary storage as the file link is gone:

$ lsof -p 666
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
[...]
bash       666 root    3u   REG    8,1       90 106660 /file (deleted)

Here, lsof confirms the file as deleted when searching by [-p]rocess ID.

In fact, even if we recreate the file, it won’t have the same inode number, so our process won’t be able to continue writing to it.

3.3. Truncate File

Now, let’s restart the background process and truncate file by copying /dev/null over it:

$ { exec 3<>file; while true; do date +%S 1>&3; sleep 1; done } &
[1] 666
$ cat file
25
26
$ cp /dev/null file
$ cat file
$

Initially, it appears the file remains empty. However, checking after at least one second has passed reveals that the script continues populating:

$ cat file
29
30

Of course, the same happens with truncation via a redirect:

$ cat file
29
30
31
32
$ > file
$ cat file
$
[...]
$ cat file
5
6
7

As expected, we see the process continuing to work with the old handle, which points to the same inode.

4. Special Cases

To begin with, everything is a file in UNIX. Because of this, most filesystem objects adhere to the same basic principles:

  • files get created when they are opened
  • files get deleted when there are no references to them

Since the latter includes both hard links and handles, files can linger after being deleted for seemingly no reason.

4.1. Seek Pointer

When an application opens a file for writing and keeps the handle open, it usually also keeps track of the current virtual cursor (pointer) position.

In fact, we can call the fseek() function to reposition the cursor within the file, so we can write at a prespecified location:

#include <stdio.h>
 
int main()
{
    FILE* fp;
    fp = fopen("file", "r");
 
    // move cursor to end
    fseek(fp, 0, SEEK_END);
 
    // print cursor position
    printf("%ld", ftell(fp));
 
    return 0;
}

In this case, we first fopen() file and position the cursor within at the end via SEEK_END. After that, we use printf() to output this current location as returned by ftell().

With this in mind, our hypothetical scenario includes two applications:

  • application A opens file for writing and keeps internal track of the pointer location
  • application B truncates file

After the external truncation by application B, any writes by application A may continue from the last known location within the file. Thus, we end up with a sparse file, where a given number of bytes at the beginning are empty. Furthermore, application B may be the shell itself.

4.2. Logging and Logs

While interactive applications have the option to output data to the terminal, services and daemons usually need other ways to provide feedback. One of the main methods to store information about the current session of a given process is logging.

In practice, logging is most often also part of the operating system (OS) itself. For example, Linux provides the classic syslog and the more current systemd kernel-space logging:

$ journalctl
-- Journal begins at Thu 2022-10-13 09:09:10 EDT, ends at Sun 2023-09-10 11:00:10 EDT. --
Oct 13 09:09:10 xost systemd[1480]: Queued start job for default target Main User Target.
Oct 13 09:09:10 xost systemd[1480]: Reached target Paths.
Oct 13 09:09:10 xost systemd[1480]: Reached target Sockets.
Oct 13 09:09:10 xost systemd[1480]: Reached target Timers.
Oct 13 09:09:10 xost systemd[1480]: Reached target Basic System.
Oct 13 09:09:10 xost systemd[1480]: Reached target Main User Target.
Oct 13 09:09:10 xost systemd[1480]: Startup finished in 63ms.
[...]

We can use journalctl to check the kernel logs in either case.

In general, a log is the common name for all data purposefully stored as a trail by running processes. Usually, it’s helpful for different activities:

Since these are important, facilities such as logrotate ensure we have a number of files in the backlog.

Moreover, many logging frameworks like Log4j and Apache mod_log* include buffering. In some cases, this also means we get full log contents within the random access memory (RAM) in the program heap. Consequently, a server like apache with the respective configuration can recreate a full log file after deletion or truncation.

5. Summary

In this article, we talked about file truncation and deletion. Basically, we explored the case of an in-use file and demonstrated some possible side effects of these operations.

In conclusion, although how a system behaves when truncating or deleting a file in use depends on the filesystem and kernel, by default, Linux has a specific way of reacting to each.

Comments are closed on this article!