1. Overview

Sometimes, we need to process a list of files. We can use the find command to generate the list of files. Then, as it’s easy to iterate over a list in Bash, we may be tempted to loop directly over find‘s output.

In this tutorial, we’ll talk about why looping over find‘s output is a bad practice. First, we’ll learn about which problems arise from doing it. Then, we’ll discuss which alternatives we can use instead of looping over find‘s output.

2. Why It’s a Bad Practice

When we write a script, we should follow good practices. This allows us to build a robust script that can process any valid input without issues. Also, it’s easier to expand and improve our script with new features when we follow good practices.

In Linux, we can use a broad set of characters in a filename. This includes alphanumeric characters, space, single quotes, double quotes, newlines, and more.

We need to use an item delimiter to iterate over a list of items. However, find prints each file separated by a newline, and the newline is also a valid character in a filename.

So, looping over find‘s output is considered bad practice because we can’t choose a list delimiter that works for all valid filenames. Also, we should consider files with spaces when we iterate over a list, as sometimes Bash considers the space a delimiter.

We can test this problem and see what happens when we have a file with a space in its name and another file with a newline. Let’s create three files on an empty folder:

$ touch file1
$ touch 'file 2'
$ touch 'file
3'
$ ls
file\n3  file\ 2  file1

As we notice, the “file\n3” file has a new line between “file” and the number 3.

Now, let’s try to run the ls command on each file, iterating over find‘s output using a for loop:

$ for F in `find -type f`; do
    ls "$F"
done
/bin/ls: cannot access './file': No such file or directory
/bin/ls: cannot access '3': No such file or directory
/bin/ls: cannot access './file': No such file or directory
/bin/ls: cannot access '2': No such file or directory
./file1

As we see, we couldn’t process the file with a space or the file with a newline.

Let’s try to improve the iteration using the while loop and piping find‘s output to the read command:

$ find -type f | while read F; do
    ls "$F"
done
/bin/ls: cannot access './file': No such file or directory
/bin/ls: cannot access '3': No such file or directory
./file\ 2
./file1

This time, we could process the file with a space. However, we still can’t handle the file with a newline.

3. Alternatives for Looping Over Filenames

So far, we’ve learned that looping over find‘s output is a bad practice because we can’t handle all possible filenames. Now, let’s see which alternatives we have to properly process a list of files obtained with find.

3.1. Using the -exec Parameter

One method we can use to write a more robust script is to leave the iteration to find itself. We can use the -exec parameter to tell find to execute a program with each file.

With this method, instead of using the Bash for loop or while loop, we can write a script that accepts the filename. Then, we can use find‘s -exec parameter with our new script, and find will run our script with each file.

When we use the -exec parameter, we have to provide the script name, followed by the parameters using {} in the filename place, and finally a \; to terminate the -exec parameter.

In the previous section’s example, our loop body runs ls “$F”. So now, let’s move the previous loop body to a new script called work_on_file.sh, taking the filename as the first parameter:

#!/bin/bash
ls "$1"

As we see, this is a trivial example. However, we can write a more complex Bash script in our work_on_file.sh file.

Now, we can run find with our new script. Let’s run find using the -exec parameter to process each file:

$ find -type f -exec ./work_on_file.sh {} \;
./file\n3
./work_on_file.sh
./file\ 2
./file1

As we see, our script listed all files without errors. Also, find included our work_on_file.sh script as it is also in the same folder.

In this case, our script only runs ls with each file. So, we can use ls instead of our script by running find -type f -exec ls {} \;.

3.2. Using the xargs Command

We also have another alternative to process find‘s output. We can use find with the -print0 parameter and pipe it to the xargs command with the -0 parameter.

When we use the -print0 parameter, find delimits each filename with a null character. We can’t use the null character with Bash, however, we can use xargs with the -0 parameter. With the -0 parameter, xargs delimits each item with the null character. We can process any arbitrary list of filenames as the null character is not valid in a filename.

xargs accepts a command as a parameter, and then it executes that command using xargs‘ input as the command’s parameters. We can limit the number of parameters per command with the -n argument. In this case, we have to use the -n 1 as our work_on_file.sh script accepts only one parameter.

Let’s pipe find‘s output to xargs so it runs the work_on_file.sh script:

$ find -type f -print0 | xargs -0 -n 1 ./work_on_file.sh 
./file\n3
./work_on_file.sh
./file\ 2
./file1

As we can see, our script listed all files without errors.

Our script only runs ls on each file, so we can also use xargs with ls by running find -type f -print0 | xargs -0 ls.

We can also modify our script to accept any number of parameters. This way, we don’t need to use the -n 1 argument. However, we can keep our script simple by accepting only one parameter, thus avoiding the use of loops.

If we want to accept more than one parameter, we can do it by iterating over the “$@” variable. Let’s modify our script and see how it works:

#!/bin/bash
for F in "$@"; do
    ls "$F"
done

Now, we can use xargs without the -n parameter:

$ find -type f -print0 | xargs -0 ./work_on_file.sh 
./file\n3
./work_on_file.sh
./file\ 2
./file1

As we can see, the script works even when we iterate over a list. We’re using the special variable $@ enclosed by double quotes.

4. Conclusion

This article taught us why looping over find‘s output is a bad practice.

First, we discussed the disadvantages of doing it and how we can’t process a file with a newline in its name. Then, we talked about how to process a list of files even when a file has a newline.

We saw that we can use the -exec parameter instead of looping over find‘s output. And then, we saw how we can combine the find and xargs commands with the -print0 and -0 parameters, respectively.

Comments are closed on this article!