Find and Delete Files and Directories

1. Overview

Under the Linux command line, we can use the find command to get a list of files or directories. Usually, we want to do some operations on the files we found, for instance, find and tar files.

In this tutorial, we’re going to take a look at how to delete the files or directories we’ve found.

2. Introduction to the Problem

There are several ways to delete the files and directories found by the find command. It’s not a hard problem. Perhaps we already have some solutions in our minds.

However, some solutions can be dangerous if we don’t correctly use them. Further, some solutions may not work well in terms of performance.

In the remainder of this tutorial, we’ll take a closer look at a common pitfall of using the find command and explain why it’s dangerous.

Moreover, we’ll discuss the performance as well.

First, let’s create a directory structure as an example:

$ tree -a test
test
├── kotlin
│   ├── ktApp1
│   │   └── .git
│   │       └── whatever.txt
│   └── ktApp2
│       └── .git
│           └── whatever.txt
└── python
    ├── pyApp1
    │   └── .git
    │       └── whatever.txt
    └── pyApp2
        └── .git
            └── whatever.txt

10 directories, 4 files

As the above tree output shows, we’ve created a test directory with some subdirectories and files.

We’ll try two deletions on the test directory:

File deletion: Remove all whatever.txt files
Directory deletion: Delete all .git directories and the files under them

Let’s have a look at the find commands to find our target directories and files.

First, let’s find all whatever.txt files:

$ find test -name 'whatever.txt'
test/python/pyApp2/.git/whatever.txt
test/python/pyApp1/.git/whatever.txt
test/kotlin/ktApp2/.git/whatever.txt
test/kotlin/ktApp1/.git/whatever.txt

Similarly, we can also find the .git directories:

$ find test -type d -name '.git'
test/python/pyApp2/.git
test/python/pyApp1/.git
test/kotlin/ktApp2/.git
test/kotlin/ktApp1/.git

In this tutorial, we’ll introduce three approaches to delete our target files and directories:

Using the find command’s -delete action
Using find -exec
Using find | xargs rm

So far, we’ve seen how to locate the files or directories we want to delete using the find command. Also, we know we can connect Linux commands with pipes and let different commands solve our problems cooperatively.

Many of us may think that the most straightforward approach to solving this problem would be piping the find result to the rm command. What is a bit surprising is, it’s not in the bullet list above.

Therefore, before we look at the real solutions to the problem, let’s understand why we can’t pipe find‘s result to rm.

3. Why “find … | rm” Won’t Work?

We need to understand what the pipe does before we answer this question. First of all, let’s see an example:

$ ls -1 / | grep '^m'
media/
mnt/

In the simple example above, we pipe the ls command’s result to grep and find out the root directories whose names begin with “m”.

Simply put, here, the pipe converts the standard output (Stdout) of ls to standard input (Stdin) of the grep command.

This command works because the grep command accepts reading from the Stdin. We can pipe the Stdout to further commands that support reading from Stdin, for example:

$ ls -1 / | grep '^m' | sed 's/^m/OK_m/'
OK_media/
OK_mnt/

We can see this kind of “command chain” pretty often in the real world.

However, not all Linux commands support reading from Stdin. Typical examples are those commands doing file handling, for example, cp, mv, and rm. These commands ignore the Stdin.

For instance, when we execute the command “rm file“, rm accepts the command-line argument file, which is indicating a file. It won’t read the Stdin at all:

$ echo "file" | rm
rm: missing operand

Therefore, the idea “find …. | rm” won’t work, either.

However, sometimes we would like to somehow turn one command’s Stdout into another command’s argument. That’s where xargs comes in handy. We’ll see it in action in later sections.

Now, let’s explore the solutions to our “find and delete” problem.

4. Using the find Command and the -delete Action

The find command provides a -delete action to remove files. Next, let’s delete the target files and directories using this action.

4.1. Deleting the Target Files and Directories

We can remove all whatever.txt files by adding the -delete option to the find command:

$ find test -name 'whatever.txt' -delete

$ tree -a test
test
├── kotlin
│   ├── ktApp1
│   │   └── .git
│   └── ktApp2
│       └── .git
└── python
    ├── pyApp1
    │   └── .git
    └── pyApp2
        └── .git

10 directories, 0 files

Good, it works. All whatever.txt files have been deleted.

Next, let’s restore the test directory and try to remove the .git directories recursively:

$ find test -type d -name '.git' -delete
find: cannot delete ‘test/python/pyApp2/.git’: Directory not empty
find: cannot delete ‘test/python/pyApp1/.git’: Directory not empty
find: cannot delete ‘test/kotlin/ktApp2/.git’: Directory not empty
find: cannot delete ‘test/kotlin/ktApp1/.git’: Directory not empty

Oops, this time, we got error messages. This is because the -delete action cannot delete a non-empty directory recursively. That is, it can only delete files and empty directories.

4.2. The Dangerous Pitfall of the -delete Usage

Next, let’s do an interesting test. We know that the order of options of a Linux command doesn’t usually matter.

For example, the following two ls commands are identical, even though the options are in a different order:

ls -F -a -l --color
ls -l -a --color -F

Now, let’s re-order the options in our last find command by moving the -delete option to the first position and see what will happen:

$ find test -delete -type d -name '.git'
$ ls test
ls: cannot access 'test': No such file or directory

This time, there’s no error message. It means the command has been executed successfully.

However, when we check the result, we’ve found that the test directory has been deleted completely! Let’s understand why it has happened.

Let’s revisit our find command. We can call the three options: -delete, -type d, and -name ‘.git’. However, we shouldn’t forget that find treats them as three expressions as well.

An expression in the find command will be evaluated, returning a boolean value, and the -delete action always returns true.

If the -delete action is at the first position, during its evaluation, it’ll delete the given directory and everything in it, which is the test directory in our example.

But wait — we’ve just learned the -delete action won’t remove non-empty directories. Why was everything under test deleted?

This is because the -delete action implies the -depth option.

The -depth option asks the find command to search each directory’s contents before the directory itself. Therefore, if we put -delete as the first option, it’ll start deletion from each directory tree’s very bottom. First, it removes all files under a directory, then the empty directory itself, until everything has been removed.

When we use the find command, we should keep in mind that we should never put the -delete action at the first position. If we do, it can delete files unexpectedly.

5. Using find -exec

When we use the find command with the -exec action, we can execute external commands on its result. Now, let’s execute the rm command to delete our target files and directories in this approach:

$ find test -name 'whatever.txt' -exec rm {} \;
$ tree -a test
test
├── kotlin
│   ├── ktApp1
│   │   └── .git
│   └── ktApp2
│       └── .git
└── python
    ├── pyApp1
    │   └── .git
    └── pyApp2
        └── .git

10 directories, 0 files

Good, all whatever.txt files have been deleted. When we use -exec with an external command, it will fill each found file in the ‘{}’ placeholder.

Similarly, we can remove all .git directories if we add the -r option to the rm command. Let’s restore the test directory and give it a try:

$ find test -depth -type d -name '.git' -exec rm -r '{}' \;
$ tree -a test
test
├── kotlin
│   ├── ktApp1
│   └── ktApp2
└── python
    ├── pyApp1
    └── pyApp2

6 directories, 0 files

As the output shows, all .git directories have been successfully deleted.

6. Using the find | xargs rm Combination

Now, we’ve learned that we can execute the rm command using find‘s –exec action. Alternatively, we can also pipe the result of the find command to xargs and let xargs call the rm command to delete those files.

Next, let’s see how to remove all whatever.txt files using this approach:

$ find test -name 'whatever.txt' | xargs rm
$ tree -a test
test
├── kotlin
│   ├── ktApp1
│   │   └── .git
│   └── ktApp2
│       └── .git
└── python
    ├── pyApp1
    │   └── .git
    └── pyApp2
        └── .git

10 directories, 0 files

Similarly, we can also remove all .git directories in the same way. Let’s restore the test directory and test it:

$ find test -type d -name '.git' | xargs rm -r
$ tree -a test
test
├── kotlin
│   ├── ktApp1
│   └── ktApp2
└── python
    ├── pyApp1
    └── pyApp2

6 directories, 0 files

As the output above shows, all .git directories have been deleted.

We may ask: If find -exec rm can solve the problem, why do we need to introduce an extra xargs process to do the same?

To learn the answer, let’s discuss their performance.

7. Benchmarking the Performance of find -exec and find | xargs

First, let’s explain how the find -exec rm approach works. When we use this approach, an rm process will be executed for each file the find command has found. That is, we’ll execute the rm command one million times if the find command finds a million files.

On the other hand, if we execute find | xargs rm, xargs will build found files into bundles and run them through the command as few times as possible.

Therefore, if our find command returns a large number of files or directories, find | xargs COMMAND will be much faster than the find -exec COMMAND approach.

Next, let’s do the same performance test with each approach and benchmark their performance.

We’ll delete 3000 files using each command and measure their execution time using the time command.

First, let’s test with the find -exec rm approach:

$ touch {1..3000}.txt
$ ls -l *.txt | wc -l
3000
$ time find . -name '*.txt' -exec rm '{}' \;
real	0m6.072s
user	0m3.130s
sys	0m2.932s

On this machine, it took about six seconds to delete all files.

Next, it’s xargs‘s turn. Let’s see if it can do the same test faster:

$ touch {1..3000}.txt
$ ls -l *.txt | wc -l
3000
$ time find . -name '*.txt' | xargs rm
real	0m0.053s
user	0m0.029s
sys	0m0.029s

This time, it took only 0.05 seconds to remove the files. Comparing to the find -exec rm approach, using xargs on this test is 120 times faster! Wow!

Therefore, if find returns a large number of files, we should consider piping the result to the xargs command.

8. Conclusion

In this article, we’ve learned three different ways to delete files or directories found by the find command. Also, we’ve understood why piping find‘s output to rm won’t work.

Moreover, we’ve discussed a dangerous pitfall of find‘s -delete action usage through an example.

Finally, we’ve also analyzed the performance of two approaches: find -exec rm and find | xargs rm.

Administration

Scripting

Networking

Files

Processes

Full Archive

About Baeldung