How to Get the Nth Match From Wildcard Expansion

1. Introduction

When working with multiple files and directories, we often need to filter and select specific files, or text from those files, based on patterns. Wildcard expansion (globbing) is a powerful way to list files that match a certain pattern. In some cases, we may need to retrieve only a specific match.

In this tutorial, we’ll learn different ways to get the Nth match from wildcard expansion. The code in this tutorial underwent testing on a Debian 12 system using GNU Bash 5.1.16.

2. Sample Dataset and Toolset

First, let’s make sure that we have all the prerequisites ready, including a directory containing multiple files with the same extension:

$ mkdir newdir && cd newdir
$ touch file{1..5}.txt

The mkdir command creates the newdir directory, while the cd command changes the current working directory to the newly created newdir directory. Next, the touch command creates five empty files using the filenames generated from the brace expansion.

Subsequently, we can check the newly created files with ls:

$ ls
file1.txt  file2.txt  file3.txt  file4.txt  file5.txt

As we can see, there are five files in the current directory. We’ll list the files in this newly created directory (newdir) and print the Nth match.

3. Using Array Indexing

We can use Bash arrays to store multiple values and access individual values using their indices.

Let’s store the filenames in an array and extract only the third match:

$ files=(*.txt) && echo "${files[2]}" 
file3.txt

This command uses *.txt as a glob pattern to match all .txt files from the current directory and stores all those values in an indexed array named files. Then, the echo command outputs the third element as the index starts with 0.

Alternatively, we can explicitly declare the array and then access the Nth match using the index:

$ declare -a files=(*.txt)
$ echo "${files[2]}"

The declare -a command defines an indexed array. Although Bash arrays don’t require explicit declaration, declare -a ensures that the variable is recognized as an array.

Moreover, we can also use the printf command to print specific array elements:

$ printf "${files[0]}"
file1.txt

Notice that the output from the echo command ends with a newline (\n), while the output from printf doesn’t add a newline at the end of the output unless explicitly specified.

4. Using the sed Command

We can also use sed to filter and find the Nth match from a list.

Let’s retrieve only the second match:

$ ls *.txt | sed -n '2p'
file2.txt

The first part of the command lists all the files with the .txt extension (ls *.txt) and then passes the output to the sed command, which prints only the second line from the input.

We can replace the 2 from the above command with any number to print the corresponding match from the list of .txt files.

Additionally, we can use sed with extended regular expression to retrieve the Nth match:

$ printf "%s " *.txt |sed -E 's/([^ ]+ ){2}([^ ]+).*/\2/'
file3.txt

Let’s understand the substitution pattern (s/ / /) in detail:

([^ ]+ ) matches individual words (any sequence of characters uninterrupted by a space)
{2} repeats the matching of the word twice
([^ ]+) matches the third word
.* matches everything else on the line
\2 refers to the second captured group (the third word)

We can retrieve the Nth match by replacing 2 (from {2}) with N-1, ensuring that the pattern skips the first N-1 matches and outputs only the Nth match.

Alternatively, we can use the same sed command with an array:

$ declare -a files=(*.txt)
$ echo ${files[*]} | sed -E 's/([^ ]+ ){2}([^ ]+).*/\2/'
file3.txt

First, we define an indexed array with all the .txt files from the current working directory. Then, ${files[*]} concatenates all elements of files as a string with spaces between them while the echo command outputs this string.

Next, the pipe (|) takes the output of the preceding command (echo) and passes it as input to sed. Finally, the sed command uses an extended regular expression (-E) to print only the third match.

5. Using the awk Command

awk is a powerful command for finding and processing specific patterns.

The awk command provides a quick way to extract specific records:

$ ls *txt | awk 'NR==4'
file4.txt

This command selects the fourth record (‘NR==4’) from a list of .txt files in the current directory.

In addition, the awk command is also effective when the filenames are newline-separated:

$ ls -1 *txt | awk 'NR==4' 
file4.txt

The ls -1 *txt construct lists all .txt files in the current directory, one per line. Then, awk employs the number of record (NR) variable to filter and display only the fourth line, i.e., file4.txt.

6. Using the grep Command

grep offers a wide range of options for pattern matching, including support for both basic and extended regular expressions.

Let’s retrieve the second match using grep:

$ printf "%s\n" *.txt | grep -n . | grep '^2:' 
2:file2.txt

This command displays the second match from the list of all the .txt files from the current directory.

Let’s break down the command:

printf “%s\n” *.txt expands *.txt to list all .txt files in the current directory, one per line
| (pipe) takes the output of the preceding command (printf) and passes it as input to the following grep command
grep -n . adds line numbers to non-empty lines
grep ‘^2:’ filters out a line that starts with 2: (the second match)

This command finds the specific match using their line numbers and prints the match in addition to the line number.

Alternatively, we can remove the line number and print only the match:

$ printf "%s\n" *.txt | grep -n . | grep '^2:' | cut -d: -f2-
file2.txt

This command extracts and prints the second match from the list of .txt files in the current directory. The cut command uses a colon (:) as the delimiter and extracts everything from the second field onward, removing the line number.

7. Using find in Combination With head and tail

We can use a combination of head and tail, along with the find command, to retrieve the Nth match from a list:

$ find . -maxdepth 1 -name "*.txt" | sort | head -n 4 | tail -n 1
./file4.txt

This command retrieves the fourth match found from the list of .txt files.

Let’s take a closer look at the options used in this command:

find . searches for the files from the current directory
-maxdepth 1 limits the search to only the current directory, preventing find from looking inside subdirectories
-name “*.txt” filters the results to include only files with a .txt extension
sort sorts the filenames in lexicographical order
head -n 4 selects the first 4 lines from the sorted list
tail -n 1 extracts the last line from the provided input

We can substitute 4 in the head -n 4 construct with N to extract the Nth match.

8. Conclusion

In this article, we learned several ways to get the Nth match from wildcard expansion.

Firstly, we created a dataset and used array indexing to extract the Nth match from wildcard expansion. Then, we explored the sed command with extended regular expression, followed by the quick way with awk to get the Nth match.

Next, we used the grep command to include line numbers and retrieve matches by their specific line numbers. Finally, we combined the find command with head and tail to extract a specific match. Although we can select any method depending on our current preferences and needs, sed is often the preferred and standard way to get the Nth match from wildcard expansion.

Administration

Scripting

Networking

Files

Processes

Full Archive

About Baeldung