In this tutorial, we’ll talk about the use of the command find with regular expressions (regex). We’ll look at how to specify the regular expression to further refine the results of the search.
2. Regular Expressions Primer
Before showing how to use regular expressions with find, let’s start with what they are and how they are constructed.
Regular expressions (shortened as regex) are powerful tools described by character sequences that specify a search pattern. This is why their use together with find yields a more refined search with a more reduced command.
There are different types of regular expressions and formats. The concepts explained below are consistent between them. However, more advanced features require knowing which regex type is being used because there are differences between them. The accepted flavors of regex by the find command are detailed in the following section.
2.1. Main Regex Tokens and Examples
Although sometimes deemed as daunting, regex improves the searches and enhances the interaction with the command line. With just basic knowledge, we can already profit from them.
As a quick introduction, there are regex tokens that match multiple characters:
- Period (.): it matches any character once (except a newline character): q.e will match the strings qwe, qre, and qee but not the strings qe or qwwe
- Asterisk (* ): it matches zero or more occurrences of the preceding character/regular expression: qw*e will match the strings qe, qwe, qwwe but not the string qre
- Backslash (\ ): it escapes special characters, for example, to search for a period: q\.e will match the string q.e but not the strings qre, qee, qe or qwwe
- Square brackets ([string] ): any of the characters of the string within square brackets return a positive match: q[we]r will match the strings qwr and qer but not the strings qr, qwer or qwewer
- Caret (^ ): it negates the content within square brackets (although it also specifies the beginning of lines when searching within a file): q[^we]r will match the strings qar and qsr but not the string qwr or qer
Two tokens frequently used in conjunction are .* that, based on the previous discussion, will match zero or more occurrences of any character except a newline, meaning that it will match any string!
3. Command Description
The use of the command find can be split into two components: a path and a search expression:
find [path] [expression]
The path is the directory for the search. The expression part also includes possible actions taken in the files that comply with the search criterion. It is there where the command find has three options related to regular expressions. We present them now with some use case examples. The following mockup directory will be used for the examples:
$ tree ./ ./ ├── a0 ├── a0.sh ├── A0.sh ├── a1 ├── a1.sh ├── A1.sh ├── a2 ├── ca ├── cb ├── cc └── folder ├── a0 ├── a1 └── a0folder ├── a0 └── a1 2 directories, 13 files
3.1. Using -regex
The first option is -regex together with the regular expression:
find [path] -regex [regular_expression]
With this command, the path is searched, and the files that comply with the regular_expression are returned. The regular_expression pattern includes the full filename, including the root path directory. This means that if looking in the current directory, the regular_expression should start with \.\/ (using the backslash to escape the special characters).
The following command finds the files (with the -type f flag) that are in the current directory (\.\/), that start with the letter a followed by either a 0 or a 1:
$ find ./ -type f -regex '\.\/a.*' ./a1 ./a0 ./a1.sh ./a0.sh
File a2 is not returned because the letter a is not followed by a 0 or a 1. We can also search in the first level directory instead of the current directory with the command:
$ find ./ -type f -regex '\.\/[^/]*\/a[^/]*' ./folder/a1 ./folder/a0
Two differences exist between the two last regexes. First, the tokens [^/]*\/ refer to any string that doesn’t contain any slash ([^/]*) followed by one slash ( \ / ) immediately before the filename that starts with the letter a. Secondly, we replaced the period with [^/] to denote that after the letter a, no more slashes can appear.
The files in the subdirectories don’t fulfill the regex: between the first slash (current directory) and the slash immediately followed by the letter a there are extra slashes for the subdirectory (for example ./folder/a0folder/a0).
Finally, to include all files in all subdirectories, we can use:
$ find ./ -type f -regex '.*a.*' ./folder/a0folder ./folder/a0folder/a0 ./folder/a0folder/a1 ./folder/a0 ./folder/a1 ./a0 ./a1 ./a0.sh ./a1.sh
3.2. Using -iregex
The second option is -iregex:
find [path] -iregex [regular_expression]
The command performs the same search as with the -regex option but ignores the letter case of the search patterns. As a mnemonic rule, the command -iregex stands for case-insensitive regex.
If we modify one of the commands from before to find only the files with a dot (by including [.]), the output looks like this:
$ find ./ -type f -regex '\.\/a[.].*' ./a0.sh ./a1.sh
The results with the -iregex flag instead of the -regex flag include the files with the capital letter A as well:
$ find ./ -type f -iregex '\.\/a[.].*' ./a0.sh ./A1.sh ./A0.sh ./a1.sh
3.3. Using -regextype
Finally, the option -regextype selects the type of regular expression:
find [path] -regex [regular_expression] -regextype [regex_type]
Different regex types are available for the command find:
- emacs (which is the default option unless otherwise specified)
The tokens defined before are compatible with all these types of regex. However, more advanced search queries may produce different results under different regex types. There is a comprehensive GNU webpage dedicated to detailing the different syntaxes.
4. Comparison With Bash Globbing
After using Linux for just a little bit, bash globbing has certainly appeared in commands like ls. Let’s consider the following command:
It lists all the files with a format extension of .png. Meanwhile, the command:
lists all the files that have a format extension of .png and that start with the letter M. This is bash globbing in action: filename completion. Bash globbing is used when searching for a name with the find command.
Even if they look similar, bash globbing and regular expressions present different syntax – complicating the matter. We discuss two of the most relevant differences. A period (.) represents a literal period in bash globbing but any single character in regex. This first command shows the bash globbing approach:
$ find ./ -type f -name 'a*.sh' ./a0.sh ./a1.sh
To obtain the same result, we can use the following regex find command:
$ find ./ -type f -regex '\.\/a.*\.sh' ./a0.sh ./a1.sh
Another difference between bash globbing and regular expressions is the asterisk (* ): it represents zero or more of any characters in bash globbing, but in regex, it represents zero o more of the preceding character. Thus, similar commands behave differently whether they expand bash globbing or regex. When we employ bash globbing, the following command returns all files starting with c:
$ find ./ -type f -name 'c*' ./cb ./cc ./ca
However, if a similar search pattern has regex, it returns all the files whose names contain only c:
$ find ./ -type f -regex '\.\/c*' ./cc
We should keep in mind these differences when searching in a directory to use either bash globbing or regular expressions to our advantage.
In this tutorial, we described how to apply some basic regular expressions to further refine the output of the find command and ease the search for files within our directories.