1. Overview

In the Linux command-line, grep is a convenient utility we use to search for text in files. However, grep isn’t able to filter the files against specific criteria first and then examine their content.

We have this requirement pretty often in our daily work, such as searching some text in all *.txt files recursively under a directory or searching some pattern in all files whose name contains a timestamp.

In this tutorial, we’ll see how to execute grep on a set of filtered files.

2. The Example Files

To understand the commands in the tutorial more easily, let’s create an example directory structure for testing:

$ tree test 
test
├── app
│   ├── change_log.log
│   └── readme.md
├── archive
│   ├── app_20200101.log.archive
│   └── app_20200201.log.archive
└── log
    ├── app_20200301.log
    ├── app_20200401.log
    └── app.log

3 directories, 7 files

We have three subdirectories under test, and each directory has a few files.

In this example, all files under the test directory contain the word “Exception”. We can verify this fact by the grep command:

$ grep -R 'Exception' test
test/app/change_log.log:Fix the NullPointerException Problem when calling external APIs
test/app/readme.md: - Exceptions are well handled
test/archive/app_20200101.log.archive:DATETIME - [Error] NullPointerException has Occurred
test/archive/app_20200201.log.archive:DATETIME - [Error] NullPointerException has Occurred
test/log/app_20200401.log:DATETIME - [Error] ClassCastException has Occurred
test/log/app_20200301.log:DATETIME - [Error] SQLException has Occurred
test/log/app.log:DATETIME - [Error] NullPointerException has Occurred

Now, let’s take this test directory as an example and address how to execute the grep command on a filtered file set.

3. grep on Files Only With Certain Extensions

3.1. Using the grep –include=GLOB Option

First, let’s see how to search for the pattern “Exception” only on files with *.log extensions:

$ grep -R --include=*.log 'Exception' test
test/app/change_log.log:Fix the NullPointerException Problem when calling external APIs
test/log/app.log:DATETIME - [Error] NullPointerException has Occurred
test/log/app_20200401.log:DATETIME - [Error] ClassCastException has Occurred
test/log/app_20200301.log:DATETIME - [Error] SQLException has Occurred

As the output above shows, only files with the file extension “log” are checked by the grep command.

We’ve used two options to tell the grep command to do that:

  • -R will search files recursively. That is, it’s going to search the given pattern in files in any subdirectory under test
  • –include=*.log is an example of the –include=GLOB option, which tells grep to only search files whose basename matches the given GLOB expression

Also, we can use multiple –include=GLOB options to ask the grep command to search on files that match multiple extensions.

Now, let’s search for the word “Exception” on *.log and *.md files:

$ grep -R --include=*.log --include=*.md 'Exception' test
test/app/readme.md: - Exceptions are well handled
test/app/change_log.log:Fix the NullPointerException Problem when calling external APIs
test/log/app.log:DATETIME - [Error] NullPointerException has Occurred
test/log/app_20200401.log:DATETIME - [Error] ClassCastException has Occurred
test/log/app_20200301.log:DATETIME - [Error] SQLException has Occurred

As we can see, the file test/app/readme.md appears in the output as well.

Alternatively, we can also use one single –include option and let the GLOB expression contain multiple extensions. The following command will print the same output:

$ grep -R --include=*.{log,md} 'Exception' test

3.2. Using Bash GLOB to Filter Files

We’ve learned that grep‘s –include=GLOB option can direct grep to search only on files with certain extensions.

Alternatively, if our Bash version is 4 or higher, we can make use of Bash’s globstar (**) to match files recursively:

$ grep 'Exception' test/**/*.log
test/app/change_log.log:Fix the NullPointerException Problem when calling external APIs
test/log/app_20200301.log:DATETIME - [Error] SQLException has Occurred
test/log/app_20200401.log:DATETIME - [Error] ClassCastException has Occurred
test/log/app.log:DATETIME - [Error] NullPointerException has Occurred

This time, the grep command looks pretty simple. It’s only responsible for searching the pattern “Exception” on given files, while the Bash GLOB is in charge of filtering files.

Further, we can also extend the Bash GLOB expression to match multiple file extensions. Let’s do the same search on the “*.log” and “*.md” files:

$ grep 'Exception' test/**/*.{log,md}
test/app/change_log.log:Fix the NullPointerException Problem when calling external APIs
test/log/app_20200301.log:DATETIME - [Error] SQLException has Occurred
test/log/app_20200401.log:DATETIME - [Error] ClassCastException has Occurred
test/log/app.log:DATETIME - [Error] NullPointerException has Occurred
test/app/readme.md: - Exceptions are well handled

If we check the output above, we can see that “test/app/readme.md” is in the list, too.

4. Combining the find Command and the grep Command

We’ve learned how to search only on files with certain extensions, which is a pretty common use case. However, sometimes, we want to search on files filtered by different criteria.

For instance, we may want to search some text only on files owned by a certain user, or with filenames matching a given pattern, or files whose last access time is earlier or later than a timestamp, and so on.

When we want to filter files using various criteria, we shouldn’t forget our good friend: the find command.

In this tutorial, we won’t dive into the details about how to find files using sophisticated criteria. Instead, we’re going to focus on how to make grep and find work together. That is, grep will search on the files in find‘s result.

We’ll search for the word “Exception” on the files whose filename has a timestamp, such as app_20200301.log and app_20200101.log.archive. 

First, let’s take a look at the find command to get those files:

$ find test -type f -a -regextype 'egrep' -regex '.*_[0-9]{8}.*' 
test/archive/app_20200201.log.archive
test/archive/app_20200101.log.archive
test/log/app_20200401.log
test/log/app_20200301.log

As the output above shows, we use the find‘s -regex option to filter files we want to search.

Now, let’s see how to let grep work on the result of the find command.

4.1. Using find -exec Action

One way to make grep work on the find‘s result is using find‘s -exec action:

$ find test -type f -a -regextype 'egrep' -regex '.*_[0-9]{8}.*' -exec grep -H "Exception" '{}' \;
test/archive/app_20200201.log.archive:DATETIME - [Error] NullPointerException has Occurred
test/archive/app_20200101.log.archive:DATETIME - [Error] NullPointerException has Occurred
test/log/app_20200401.log:DATETIME - [Error] ClassCastException has Occurred
test/log/app_20200301.log:DATETIME - [Error] SQLException has Occurred

Each found file will fill the placeholder “{}” in the -exec action. In this way, the grep command will search on each found file.

The command requires the ending “\;“. This is because it indicates the termination of the grep command.

It’s worth mentioning that the -exec action will be executed on each file the find command has found. In other words, if find delivers a million files, we’re going to run the -exec action a million times.

Obviously, the -exec action’s performance won’t be good if we want to search some text on a huge number of files.

Next, let’s look at how to make find and grep work more efficiently.

4.2. Using xargs to Combine find and grep

Unlike find‘s -exec action,xargs will build found files into bundles and run them through the command as few times as possible.

This is a big advantage over the -exec action, particularly when we search on a large number of files.

Finally, let’s combine find and grep using the xargs command:

$ find test -type f -a -regextype 'egrep' -regex '.*_[0-9]{8}.*' | xargs grep "Exception" 
test/archive/app_20200201.log.archive:DATETIME - [Error] NullPointerException has Occurred
test/archive/app_20200101.log.archive:DATETIME - [Error] NullPointerException has Occurred
test/log/app_20200401.log:DATETIME - [Error] ClassCastException has Occurred
test/log/app_20200301.log:DATETIME - [Error] SQLException has Occurred

5. Conclusion

In this article, we’ve learned how to execute grep on files that match specific criteria.

Additionally, we’ve discussed the difference between find‘s -exec action and the xargs command. If we want to gain better performance, xargs is the right tool to choose.

Comments are closed on this article!