Using grep With awk | Baeldung on Linux

1. Introduction

grep and awk are two powerful command-line tools in Linux that we can use for text processing and data extraction. While grep quickly finds patterns in files, awk enables more advanced data search and manipulation. By combining these tools, we can simplify tasks like filtering, formatting, and analyzing text efficiently and precisely.

In this tutorial, we’ll first cover the basics of each tool to understand its strengths. Then, we’ll move on to specific scenarios where using grep and awk provides useful results.

Finally, with tested code, we demonstrate practical examples.

2. Understanding grep and awk

Perhaps the primary version of grep is to search for specific patterns within files or input streams. In particular, the tool scans text line by line and returns only the matching lines, making it ideal for quickly filtering and locating data in large files.

On the other hand, awk provides powerful data manipulation features. For instance, we can use awk to modify and format text based on set rules, processing each line independently. Further, awk excels at extracting fields, performing calculations, and applying conditions, making it ideal for generating reports and analyzing structured data.

While many scenarios might be solvable with awk alone as the more versatile of the two, piping from grep often makes the command more readable.

3. Sample Data

We use a single text file named file.txt that contains all the necessary data for each scenario we test:

$ cat file.txt
pattern orange 5
pattern yellow 10
mango 90
grape purple large
apple,50
banana,120
grape,30
blue,150
mango 90
apple red small
banana yellow medium
apple banana oranges
blue,200

By using just one file, we simplify the examples, making it easier to follow and understand the concepts effectively.

4. Searching for Patterns and Extracting Fields

One of the most common use cases for combining grep and awk is searching for a specific pattern in a file and then extracting certain fields from the matching lines.

4.1. Piping grep to awk

To begin with, let’s check a basic scenario:

$ grep "pattern" file.txt | awk '{print $2, $3}'
orange 5
yellow 10

In this scenario, grep searches for lines containing patterns within file.txt. Once the desired lines are found, the output is piped to awk, which extracts the second and third fields ($2 and $3) from each matching line.

4.2. Extracting Fields

Further, we can also extract specific fields only when multiple conditions are met:

$ grep "pattern1" file.txt | grep "pattern2" | awk '{print $2, $3}'

Here, the first grep looks for a line containing pattern1. Then, the output is piped to the second grep which further filters for lines containing pattern2.

Finally, awk extracts and prints the second and third fields from the result.

5. Filtering Specific Columns Based on Patterns

Another useful combination is when we want to filter specific columns from a file based on a pattern and then manipulate those columns with awk:

$ grep "blue" file.txt | awk -F, '{if ($2 > 100) print $1, $2}'
blue 150
blue 200

Here, awk checks whether the second column contains a value greater than 100. If the grep condition is met, awk prints the first and second columns.

6. Extracting Data With Complex Pattern Matching

Sometimes, we need more complex pattern matching to extract specific data from a file. Thus, we might resort to regular expressions.

For example, to extract lines that start with alphabetical characters, we can use the respective ranges:

$ grep -E "^[a-zA-Z]" file.txt | awk '{print $0}'

In the example, we use grep with the -E option to enable extended regular expression use. This way, the pattern ^[a-zA-Z] matches lines that start with an uppercase or lowercase letter. Finally, awk prints all the matched lines.

7. Adding Values Based on a Pattern

In some cases, we may need to add values from a specific field based on a matching pattern:

$ grep "pattern" file.txt | awk '{sum += $3} END {print sum}'
15

Here, awk sums the values in the third field for all matching lines. The END block prints the total sum after all lines have been processed.

8. Counting the Number of Occurrences of a Pattern

We can also use grep and awk to determine how frequently a specific word or pattern appears within a file:

$ grep -o "apple" file.txt | awk '{count++} END {print "apple appears", count, "times"}'
apple appears 3 times

In this code, -o tells grep to output each match for the word apple on a separate line. Later, awk increments the count variable by one (1) for each line. Once all lines have been processed, END stops and prints the final count.

9. Finding Duplicate Lines in a File

By combining grep and awk, we can locate duplicate files efficiently:

$ grep -oE '.*' file.txt | awk 'seen[$0]++' | sort
mango 90

In this case, -oE extracts each line from file.txt as a single entry, without splitting it into individual words. Next, awk uses an array seen[$0] for each occurrence on a line. If a line appears more than once, it appears as a duplicate.

10. Finding the Differences Between Two Files

When working with multiple file versions, identifying what has changed between them is often useful or necessary. To do that, we can use grep and awk to find the differences between the two files.

For example, to find lines that appear in file.txt that aren’t in file2.txt, we can use a specific flag:

$ grep -Fxv -f file2.txt file.txt | awk '{print $0}'

Here, -F tells grep to interpret the pattern as text rather than a regular expression, and -x matches a whole line. Meanwhile, -v displays only those lines from file.txt that aren’t present in file2.txt. The -f flag compares each line in file.txt against those listed in file2.txt.

11. Converting Data From One Format to Another

In many cases, we may need to convert the data from one format to another:

$ grep "Apple" file.txt | awk '{print tolower($0)}'

In this example, grep searches for those lines with Apple in them, and later awk converts those lines to lowercase.

Similarly, we can convert those lines to uppercase as well:

$ grep "apple" names.txt | awk '{print toupper($0)}'

The command is nearly identical, but we use toupper instead of tolower in awk.

12. Conclusion

In this article, we explored practical ways to use grep and awk for efficient data processing in Linux.

We demonstrated how to combine these two commands to compare files, extract specific data, and manipulate text in various formats.

Each scenario offers unique advantages, and we can choose the most suitable approach for any given task based on its requirements.

Administration

Scripting

Networking

Files

Processes

Full Archive

About Baeldung