Black Friday 2025 – NPI EA (cat = Baeldung on Linux)
announcement - icon

Yes, we're now running our Black Friday Sale. All Access and Pro are 33% off until 2nd December, 2025:

>> EXPLORE ACCESS NOW

Baeldung Pro – Linux – NPI EA (cat = Baeldung on Linux)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

Partner – Orkes – NPI EA (tag=Kubernetes)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

1. Introduction

grep and awk are two powerful command-line tools in Linux that we can use for text processing and data extraction. While grep quickly finds patterns in files, awk enables more advanced data search and manipulation. By combining these tools, we can simplify tasks like filtering, formatting, and analyzing text efficiently and precisely.

In this tutorial, we’ll first cover the basics of each tool to understand its strengths. Then, we’ll move on to specific scenarios where using grep and awk provides useful results.

Finally, with tested code, we demonstrate practical examples.

2. Understanding grep and awk

Perhaps the primary version of grep is to search for specific patterns within files or input streams. In particular, the tool scans text line by line and returns only the matching lines, making it ideal for quickly filtering and locating data in large files.

On the other hand, awk provides powerful data manipulation features. For instance, we can use awk to modify and format text based on set rules, processing each line independently. Further, awk excels at extracting fields, performing calculations, and applying conditions, making it ideal for generating reports and analyzing structured data.

While many scenarios might be solvable with awk alone as the more versatile of the two, piping from grep often makes the command more readable.

3. Sample Data

We use a single text file named file.txt that contains all the necessary data for each scenario we test:

$ cat file.txt
pattern orange 5
pattern yellow 10
mango 90
grape purple large
apple,50
banana,120
grape,30
blue,150
mango 90
apple red small
banana yellow medium
apple banana oranges
blue,200

By using just one file, we simplify the examples, making it easier to follow and understand the concepts effectively.

4. Searching for Patterns and Extracting Fields

One of the most common use cases for combining grep and awk is searching for a specific pattern in a file and then extracting certain fields from the matching lines.

4.1. Piping grep to awk

To begin with, let’s check a basic scenario:

$ grep "pattern" file.txt | awk '{print $2, $3}'
orange 5
yellow 10

In this scenario, grep searches for lines containing patterns within file.txt. Once the desired lines are found, the output is piped to awk, which extracts the second and third fields ($2 and $3) from each matching line.

4.2. Extracting Fields

Further, we can also extract specific fields only when multiple conditions are met:

$ grep "pattern1" file.txt | grep "pattern2" | awk '{print $2, $3}'

Here, the first grep looks for a line containing pattern1. Then, the output is piped to the second grep which further filters for lines containing pattern2.

Finally, awk extracts and prints the second and third fields from the result.

5. Filtering Specific Columns Based on Patterns

Another useful combination is when we want to filter specific columns from a file based on a pattern and then manipulate those columns with awk:

$ grep "blue" file.txt | awk -F, '{if ($2 > 100) print $1, $2}'
blue 150
blue 200

Here, awk checks whether the second column contains a value greater than 100. If the grep condition is met, awk prints the first and second columns.

6. Extracting Data With Complex Pattern Matching

Sometimes, we need more complex pattern matching to extract specific data from a file. Thus, we might resort to regular expressions.

For example, to extract lines that start with alphabetical characters, we can use the respective ranges:

$ grep -E "^[a-zA-Z]" file.txt | awk '{print $0}'

In the example, we use grep with the -E option to enable extended regular expression use. This way, the pattern ^[a-zA-Z] matches lines that start with an uppercase or lowercase letter. Finally, awk prints all the matched lines.

7. Adding Values Based on a Pattern

In some cases, we may need to add values from a specific field based on a matching pattern:

$ grep "pattern" file.txt | awk '{sum += $3} END {print sum}'
15

Here, awk sums the values in the third field for all matching lines. The END block prints the total sum after all lines have been processed.

8. Counting the Number of Occurrences of a Pattern

We can also use grep and awk to determine how frequently a specific word or pattern appears within a file:

$ grep -o "apple" file.txt | awk '{count++} END {print "apple appears", count, "times"}'
apple appears 3 times

In this code, -o tells grep to output each match for the word apple on a separate line. Later, awk increments the count variable by one (1) for each line. Once all lines have been processed, END stops and prints the final count.

9. Finding Duplicate Lines in a File

By combining grep and awk, we can locate duplicate files efficiently:

$ grep -oE '.*' file.txt | awk 'seen[$0]++' | sort
mango 90

In this case, -oE extracts each line from file.txt as a single entry, without splitting it into individual words. Next, awk uses an array seen[$0] for each occurrence on a line. If a line appears more than once, it appears as a duplicate.

10. Finding the Differences Between Two Files

When working with multiple file versions, identifying what has changed between them is often useful or necessary. To do that, we can use grep and awk to find the differences between the two files.

For example, to find lines that appear in file.txt that aren’t in file2.txt, we can use a specific flag:

$ grep -Fxv -f file2.txt file.txt | awk '{print $0}'

Here, -F tells grep to interpret the pattern as text rather than a regular expression, and -x matches a whole line. Meanwhile, -v displays only those lines from file.txt that aren’t present in file2.txt. The -f flag compares each line in file.txt against those listed in file2.txt.

11. Converting Data From One Format to Another

In many cases, we may need to convert the data from one format to another:

$ grep "Apple" file.txt | awk '{print tolower($0)}'

In this example, grep searches for those lines with Apple in them, and later awk converts those lines to lowercase.

Similarly, we can convert those lines to uppercase as well:

$ grep "apple" names.txt | awk '{print toupper($0)}'

The command is nearly identical, but we use toupper instead of tolower in awk.

12. Conclusion

In this article, we explored practical ways to use grep and awk for efficient data processing in Linux.

We demonstrated how to combine these two commands to compare files, extract specific data, and manipulate text in various formats.

Each scenario offers unique advantages, and we can choose the most suitable approach for any given task based on its requirements.

1 Comment
Oldest
Newest
Inline Feedbacks
View all comments