1. Overview

AWK is a programming language that provides flexible functions to manipulate strings. It’s sometimes preferable to the grep or sed commands in Linux because awk has richer text processing capabilities.

In this tutorial, we’ll learn how to search and count the matching text patterns using awk. We’ll first look at a simple awk command to find the number of lines with matching patterns. Then, we’ll learn the command to find the total number of pattern occurrences in a file. Finally, we’ll look at ways to obtain the same result using grep.

2. Sample Text File

Let’s first use the cat command to look at an example text file, test.log:

$ cat test.log 
Hello 1
Hello 2
Hell o
Hello3
4Hello

We can see that there are five lines in total, only four of which have the word Hello written correctly. So, we’ll use test.log for our searches below.

3. Print All Matches With awk

To print all lines containing the word Hello, we can use a basic match and print:

$ awk '/Hello/ {print $0}' test.log 
Hello 1
Hello 2
Hello3
4Hello

In the above command, the Hello pattern is placed between the // characters, while the {print $0} command is used to print the lines that match.

In particular, we see all lines from test.log that contain Hello.

4. Count All Matches With awk

To count the number of the matching pattern occurrences, we can use an awk command with a counter:

awk '/<pattern>/ {count++} END {print count}' <text file>

The keyword END allows us to separate the awk operations during the text pattern search from the operations after the search is completed.

In our case, we use the variable count to keep track of all matching patterns occurrences during the search. Therefore, we place the count++ operation before the END keyword to sum the number of all matching cases.

Then, after the END keyword, we use {print count} to get the result of the count operation and to show the final number of pattern matches.

Let’s use the above command with our text.log file and count the number of Hello occurrences:

$ awk '/Hello/ {count++} END {print count}' test.log
4

As expected, the number of Hello occurrences is 4.

5. More Than One Match per Line

In some cases, we might have lines that contain more than one instance of the searched pattern.

5.1. Sample Text File

The above example works well if each line contains no more than one occurrence of the pattern. However, if some lines have more than one match per line, then some of the matches will be missing.

Let’s use the new test2.log file:

$ cat test2.log 
Hello 1
Hello 2
Hell o
Hello3
4Hello 5Hello

We can see that now we have five occurrences of the Hello word.

5.2. Wrong Results

Let’s try to apply the earlier awk command to the new sample file:

$ awk '/Hello/ {count++} END {print count}' test2.log                        
4

We’re still getting the result of four matching patterns, which isn’t correct. This is because awk counts the lines where at least one pattern exists, not the total number of matches.

5.3. Count All Pattern Occurrences

To count all pattern occurrences, we need to use a more complex awk command:

awk '{while (match($0, /<pattern>/)) {count++; $0=substr($0, RSTART+RLENGTH)}} END {print count}' <text file>

Let’s look at this command in more detail:

  • while (match($0, /<pattern>/)) is a while loop that invokes the match function for the current line variable $0
  • RSTART+RLENGTH calculates the position of the last character in the match, so we can cut the line to it
  • {count++; $0=substr($0, RSTART+RLENGTH)} runs when a pattern match occurs, increasing the count variable and updating the input string $0 to start from the position after the match
  • END {print count} prints the final number of matches

Let’s apply this command to our sample file test2.log:

$ awk '{while (match($0, /Hello/)) {count++; $0=substr($0, RSTART+RLENGTH)}} END {print count}' test2.log
5

Now, we’ve calculated the total number of pattern occurrences correctly.

6. Compare the Result With the grep Command

To make sure the number we obtained is correct, we can use the grep command to double-check the result.

For that, let’s use grep piped to wc for counting the output lines:

$ grep -o Hello test2.log | wc -l                        
5

Here, we use the -o option to count the total number of correct matches. The wc command uses the option -l to get the number of grep output lines.

We’ve got the same result (five) as expected, so our awk command from above is correct.

7. Conclusion

In this article, we learned how to find and print the number of text pattern matches in the text file using the awk command. We looked at both simple and complex cases of the pattern matches in a file. We also saw how to obtain the same result with the grep command.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.