1. Overview

Efficient text search is quintessential to working with files in Linux, especially when the number of files is large.

In this tutorial, we’ll learn how to find all files containing a specific text pattern using Linux utilities like find, grep, and a few other alternatives.

2. Scenario Setup

Although the use case of searching a specific pattern applies to any file, it’s more prominent when searching through a large code base or files with a deep hierarchy.

So, let’s start by setting up the open-source codebase for the Kubernetes project:

$ git clone https://github.com/kubernetes/kubernetes.git --quiet
$ cd kubernetes/

In the following sections, we’ll learn different strategies to find all files within the kubernetes directory such that they contain a search pattern.

3. Using grep

Let’s say we want to find all the code files that use the echo command to send output to a log file. It’s an apt scenario where the grep command can search the pattern across files.

Since we want to do a recursive search, we’d need to use the -R option. Additionally, we need to use the -l option to limit the output to the filenames, without the lines containing the pattern. So, let’s go ahead and see this in action:

$ grep -Rl "/bin/echo.*log.*" .

We must note that we used “/bin/echo.*log.*” as the search pattern and . for scoping the search to all files within the current directory.

4. Using find

Let’s say we want to find all the shell scripts that use the mkdir command to create a directory. Additionally, we want to restrict our search to large files with more than 25k characters. It’s an apt scenario where we can use the find command with advanced filter options.

Firstly, we must use the -type, -name, and -size options to restrict the file search. Secondly, we’ll need the -exec option to search for a pattern within these files using the grep command.

So, let’s execute a one-liner command to find all the matching files:

# find . -type f -name "*.sh" -size +35000c -exec grep -m1 -l "mkdir -p" {} \;

We must note that we didn’t grep recursively because we already have the file paths with us. Additionally, we optimized the pattern search with the -m1 option by matching a maximum of one occurrence per file.

5. Searching a Git Repository

It’s a common scenario where we want to search through text patterns in code files that are part of a git repository. In this section, we’ll learn about a few tools specific to this use case.

5.1. Using git grep

Let’s assume we want to search for the text pattern “docker-compose” across all files. Moreover, we want to search it within two specific releases, namely release-1.1 and release-1.2. For such a scenario, git grep is ideal because it can help us scan through multiple releases simultaneously.

Next, let’s go ahead and see this in action:

$ git grep -l -e "docker-compose" remotes/origin/release-1.{1,2}

We can note that we used the brace parameter expansion to specify multiple releases with a common prefix. Additionally, the -l option of the git grep command shows the release name along with the filename in the output.

5.2. Using ripgrep

While working with a git repository, we usually want to exclude the files added in .gitignore from our area of focus. The same applies when trying to find files containing a text pattern. For such scenarios, ripgrep is another helpful utility in our toolkit.

First, let’s take a look at one of the entries in the .gitignore file:

$ grep "^[^#]output*" .gitignore

Next, let’s create the output directory and add the tmp-hello-world file containing the “Hello, world!” pattern:

$ mkdir -p output; echo "Hello, world!" >> output/tmp-hello-world

In continuation, let’s use the grep command to search for the pattern “Hello, world!” across files:

$ grep -Rl "Hello, world!" .

We can see that the last entry in the output is for the tmp-hello-world file. So, we can infer that the grep command doesn’t exclude the file entries in the .gitignore file. That’s one scenario where the ripgrep utility can help us.

So, let’s install the ripgrep package because it isn’t preinstalled in most Linux distros, and put it into action for search:

$ apt-get update && apt-get -y install ripgrep
$ rg -l "Hello, world!"

Perfect! We can see that the tmp-hello-world file is now excluded from the result.

6. Faster Alternatives

Although grep is still the most widely known search utility in Linux, there are a few good grep-alternatives, such as UniversalCodeGrep (ucg), Silver Searcher (ag), ack, sift, and Platinum Searcher (pt) utilities. Although these utilities perform faster than grep, they aren’t preinstalled in most Linux distros. So, we need to install them before use.

Next, let’s see a comparison of time taken by some of these commands for our use case of searching the “Hello, world!” pattern across files:

$ time find . -type f -exec grep -l "Hello, world!" {} \;

$ time grep -Rl -e "Hello, world!"

$ time ack -l "Hello, world!"

$ time git grep -l "Hello, world!"

# time rg -l "Hello, world!"

$ time ag -l "Hello, world!"

It looks like Silver Searcher (ag) is the winner.

7. Conclusion

In this tutorial, we learned how to find all files containing a pattern using popular Linux utilities such as grep, find, git grep, and ripgrep. Additionally, we also learned about a few fast-performing grep-like utilities such as UniversalCodeGrep (ucg), Silver Searcher (ag), ack, sift, and Platinum Searcher (pt).

Comments are closed on this article!