1. Overview

Using the tar command, we can package multiple files into an archive, which is also called a “tarball”. Sometimes, we would like to archive files of certain types.

In this quick tutorial, we’ll address how to achieve it.

2. The Example

To make the demonstration and explanation easier, first of all, let’s create an example:

$ tree app
app
├── app.config
├── data
│   ├── app.data
│   ├── orders.cache
│   └── orders.csv
├── log
│   ├── app.log
│   └── login.log
└── program
    └── Application.class

3 directories, 7 files

As the output above shows, we have an application directory named “app” that contains some subdirectories and files.

Now, let’s say we need to archive all log files (*.log) and data files (*.csv and*.data) but exclude the cache file in a tarball.

We can take the “divide and conquer” method to solve the problem:

  • Finding the files we need
  • Archiving the files we’ve found

Next, let’s see them in action.

3. Using the find Command to Find Required Files

The first step is finding the files we need.

So, here we need to find files under the app directory with *.log, *.csv, or *.data filename extensions. This is the kind of job the find command is good at:

$ find app -name *.data -o -name *.csv -o -name *.log
app/data/orders.csv
app/data/app.data
app/log/login.log
app/log/app.log

In the find command above, we’ve used the -o operator (logical OR) to join three -name expressions to get the required files.

Alternatively, we can also find the files by Regex:

$ find app -type f -a -regex '.*[.]\(data\|csv\|log\)$'
app/data/orders.csv
app/data/app.data
app/log/login.log
app/log/app.log

This time, we’ve used the -a operator (logical AND) to join the Regex and the ‘-type f‘ expressions.

It’s worth mentioning that find uses findutils-default as the default Regex type. We can assign a different Regex type with the -regextype option.

For example, we can use the egrep type to save some escapes to make the code easier to read:

find app -regextype egrep -type f -a -regex '.*[.](data|csv|log)$'

find will list all supported Regex types if we execute the command “find -regextype help“.

4. Archiving the Found Files

So far, we’ve found all files we need. Now, let’s create the archive. In other words, we need to pass the find command’s result to the tar command somehow.

Many of us may come up with the command substitution approach to make find and tar work together:

$ tar -cf app-data.tar $( find app -name *.data -o -name *.csv -o -name *.log )

After, we can check the content of the created app-data.tar file. We see this approach works for our example problem:

$ tar -tf app-data.tar 
app/data/orders.csv
app/data/app.data
app/log/login.log
app/log/app.log

The command is pretty straightforward. However, it may fail if the find‘s output contains a large number of files, and we would get the “argument list too long” error message.

Next, let’s see a more robust way to solve the problem:

$ find app -name *.data -o -name *.csv -o -name *.log | tar -cf app.tar -T -

Let’s pass through the command above quickly. The find part isn’t new to us. Let’s have a look at the tar command part.

The -T FILE option provides a FILE containing a file list and tells the tar command to archive files in this list. In the command above, we’ve used ‘-T –‘. The ‘‘ here after the -T means the standard input (stdin).

find pipes the file list to the tar command. Thus, tar‘s stdin is fed by find‘s output. Therefore, tar will archive all the files find has piped in.

Finally, let’s check if the tarball contains our expected files:

$ tar -tf app.tar
app/data/orders.csv
app/data/app.data
app/log/login.log
app/log/app.log

5. Conclusion

In this article, we’ve addressed how to use the find and tar commands to archive files of certain types.

Also, we’ve discussed why the command substitution approach may fail.

Comments are closed on this article!