1. Overview

The find command is the quintessential search utility in Linux to find files and directories. We might sometimes want to aggregate the contents within the searched files into a single target file for our use case.

In this tutorial, we’ll learn how to cat the contents of files found using find into a single file.

2. Scenario Setup

Let’s start by using the exa command to look at a playground directory structure in a tree-like view:

$ exa --tree playground
playground
├── employee_data1.txt
├── employee_data2.txt
└── employee data3.txt

We must note that there is a space in one of the filenames, so we’re sure to handle this scenario in our scripts.

Next, let’s see the employee Ids stored within the files under the playground directory using the awk command:

$ awk 'FNR==1{print "===",FILENAME,"==="}; 1;' playground/*
=== playground/employee data3.txt ===
e11,e12
=== playground/employee_data1.txt ===
e1,e2,e3,e4,e5
=== playground/employee_data2.txt ===
e6,e7,e8,e9,e10

We preferred the awk command over other commands, such as cat, because it’s convenient to show the filenames and their content together using the in-built FILENAME variable. Additionally, we used the shorthand notation (1) to print the content.

Lastly, let’s create a temporary target file using the mktemp command:

$ TMP_EMP_DATA_FILE=$(mktemp); echo $TMP_EMP_DATA_FILE
/tmp/tmp.JAibhZUJSd

Our goal in the subsequent sections will be to search the files under the playground directory and aggregate all the employee Ids into the $TMP_EMP_DATA_FILE file.

3. Using find With the -exec Option

We can use the -exec option available with the find command to execute a shell command right after a match:

$ find <path> <search_criteria> -exec <command> {} \;

First, let’s use this concept to execute the cat command for each matching file and redirect its content to the $TMP_EMP_DATA_FILE file:

$ find playground -type f -exec cat {} >> ${TMP_EMP_DATA_FILE} \;

Next, let’s verify that the target file contains the entire expected data:

$ cat ${TMP_EMP_DATA_FILE}
e6,e7,e8,e9,e10
e11,e12
e1,e2,e3,e4,e5

Lastly, let’s empty the contents of the ${TMP_EMP_DATA_FILE} file before starting the next section:

$ cat /dev/null > ${TMP_EMP_DATA_FILE}

We must execute this step before starting each new approach in the next few sections.

4. Using find With the -execdir Option

Alternatively, we can use the -execdir option over the -exec option to bring some improvement. However, before proceeding with this approach, we must understand a significant difference in their execution behavior.

Let’s start by analyzing the number of times the command executes when we use the -exec option:

$ find playground -name "*.txt" -type f -exec echo "executing for" {} \;
executing for playground/employee_data2.txt
executing for playground/employee data3.txt
executing for playground/employee_data1.txt

Now, let’s use the -execdir option and terminate our command with the + sign:

$ find playground -name "*.txt" -type f -execdir echo "executing for" {} +
executing for ./employee_data2.txt ./employee data3.txt ./employee_data1.txt

Interestingly, when we used the -execdir option, we were able to execute the cat command for multiple files at once. On the contrary, the number of executions with the -exec option is the same as the number of files in the search result.

Lastly, let’s use this optimized approach to solve our use case and verify the content of the ${TMP_EMP_DATA_FILE} file:

$ find playground -name "*.txt" -type f -exec cat {} >> ${TMP_EMP_DATA_FILE} \;
$ cat ${TMP_EMP_DATA_FILE}
e6,e7,e8,e9,e10
e11,e12
e1,e2,e3,e4,e5

Great! It looks like we nailed this.

5. Using find With the -print0 and xargs Options

Since the cat command accepts multiple files as an argument, we can use the xargs command to pass the filenames directly to the cat command.

Let’s go ahead and use this approach to cat the contents of the files to the ${TMP_EMP_DATA_FILE} file:

$ find playground -name "*.txt" -type f | xargs cat >> ${TMP_EMP_DATA_FILE}
cat: playground/employee: No such file or directory
cat: data3.txt: No such file or directory

Unfortunately, it didn’t work as we had planned. That’s because one of the filenames contains whitespace, and xargs uses whitespace as its default delimiter. So, the cat command receives invalid filename arguments, namely playground/employee and data3.txt.

To solve this issue, we can use a null character (\0) as a delimiter instead of a whitespace. First, we must use the -print0 option of the find command to add the \0 as a delimiter. Then, we’ll need the -0 option of the xargs command to use \0 as a delimiter for parsing before it sends the filenames to the cat command.

Let’s see if our approach to using null-character (\0) as a delimiter gives us a resolution:

$ find playground -name "*.txt" -type f -print0 | xargs -0 cat >> ${TMP_EMP_DATA_FILE}
$ cat ${TMP_EMP_DATA_FILE}
e6,e7,e8,e9,e10
e11,e12
e1,e2,e3,e4,e5

We can see that it works as expected.

6. Using Loop Constructs

In this section, we’ll use the loop constructs available in Bash to execute the cat command iteratively for each filename in the search results.

6.1. for Loop

Let’s see how to use a for loop for passing the filename one-one-by to the cat command:

$ for file in "$(find playground -name "*.txt" -type f)"
do 
    cat $file >> ${TMP_EMP_DATA_FILE}; 
done

Now, let’s verify the contents of the ${TMP_EMP_DATA_FILE} file:

$ cat ${TMP_EMP_DATA_FILE}
e6,e7,e8,e9,e10
e11,e12
e1,e2,e3,e4,e5

It worked fine.

6.2. while Loop

We can also write a conditional while loop where we use the read command to send filenames to cat iteratively:

$ find playground -name "*.txt" -type f | \
while read file; do cat "$file" >> ${TMP_EMP_DATA_FILE}; done

Except for the conditional logic, the rest of the code is similar to our approach of using the for loop.

To conclude this approach, let’s verify the output in the target file:

$ cat ${TMP_EMP_DATA_FILE}
e6,e7,e8,e9,e10
e11,e12
e1,e2,e3,e4,e5

As expected, we’ve got the contents of all the files from the search results into the $TMP_EMP_DATA_FILE file.

7. Using tee

So far, we used the redirection operator (>>) to send the content to a single target file. Alternatively, in any approach, we can use the tee command to append the content into the $TMP_EMP_DATA_FILE file.

Let’s go ahead and see how to use the tee command in one of the approaches where we used the -exec option with find:

$ find playground -name "*.txt" -type f -exec cat {} \; | tee ${TMP_EMP_DATA_FILE}
e6,e7,e8,e9,e10
e11,e12
e1,e2,e3,e4,e5

We can see the entire output on stdout without looking into the $TMP_EMP_DATA_FILE file. Nonetheless, for the sake of completeness, let’s verify the contents in the file directly:

$ cat $TMP_EMP_DATA_FILE
e6,e7,e8,e9,e10
e11,e12
e1,e2,e3,e4,e5

8. Using sed and awk

In this section, we’ll learn how to use two popular text-processing utilities, sed , and awk, to solve our use case.

First, let’s execute the cat command for each file using the /e flag with the substitution command (s) available with sed:

$ find playground -name "*.txt" -type f | \
sed -n -e 's/.*/cat "&"/ep' >${TMP_EMP_DATA_FILE}

We must note that we enclosed the filenames within double quotes because filenames can contain spaces.

Further, let’s verify that the command executed successfully:

$ cat ${TMP_EMP_DATA_FILE}
e6,e7,e8,e9,e10
e11,e12
e1,e2,e3,e4,e5

Moving on, let’s use awk for implementing a similar strategy. We’ll first enclose the filenames within double quotes and then pass them to cat:

$ find playground -name "*.txt" -type f | \
awk '{print "\""$0"\"";}' | \
awk -vOUT_FILE=${TMP_EMP_DATA_FILE} '{system("cat "$0">>"OUT_FILE);}'

We must note that we used the system function available for executing cat. Moreover, we used the -v option to pass the shell variable TMP_EMP_DATA_FILE as OUT_FILE, an internal variable for awk.

Finally, we must verify the correctness of our approach by inspecting the contents of the $TMP_EMP_DATA_FILE file:

$ cat ${TMP_EMP_DATA_FILE}
e6,e7,e8,e9,e10
e11,e12
e1,e2,e3,e4,e5

That’s it. We’ve solved our use case using sed and awk successfully.

9. Using parallel Command

In this section, we’ll use an interesting command-line utility, parallel, to cat the contents from multiple files to a single file.

First, we need to install the package for the parallel utility because it doesn’t come preinstalled in most Linux distros:

$ apt-get install parallel

Next, let’s use the parallel utility for executing the cat command and sending the contents to the $TMP_EMP_DATA_FILE file:

$ find playground -name "*.txt" -type f | \
parallel -j1 cat {} > ${TMP_EMP_DATA_FILE}

We must remember that the parallel command-line utility can trigger multiple jobs in parallel and get us the result faster. However, our use case involves a shared resource. So, it’s recommended to run a single job using the -j option and avoid any race conditions.

Now, let’s check the contents of the target file:

$ cat ${TMP_EMP_DATA_FILE}
e6,e7,e8,e9,e10
e11,e12
e1,e2,e3,e4,e5

Fantastic! We’ve learned one more approach.

10. Conclusion

In this article, we learned several ways to cat the contents of the files found using find to a single destination file.

We started by exploring different options available with find, such as -exec, -execdir, and -print0. Additionally, we used the loop constructs in Bash, and interesting command-line utilities, such as xargs, sed, awk, tee, and parallel, to solve the use case.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.