How to Match Words and Ignore Multiple Spaces Using grep and tr

1. Overview

Text processing is a part of many common tasks for system administrators and users of Linux. These tasks may include analyzing log files, filtering data, or searching through text. Sometimes, we may encounter situations where the text contains multiple spaces making it harder to search for and extract data. This can occur due to issues such as data entry errors or formatting issues.

In this tutorial, we’ll explore matching words and ignoring multiple spaces using the grep and tr commands.

2. Problem Statement

To demonstrate, we use adventure.txt as the sample file:

$ cat adventure.txt
A long time ago, there lived an old  owl named lily.
The   owl resided in an ancient oak tree, surrounded by a   peaceful forest.
Every night, the owl would    hoot softly, sharing stories with the     moon and stars.
Villagers from nearby would sometimes visit the owl to     seek wisdom.
Despite the darkness, the owl's eyes would      shine brightly, illuminating the path.
One such night, a young		traveler lost his way in the forest.
...

In the upcoming sections, the above file serves to illustrate how to match specific words and ignore multiple spaces.

3. Handling Multiple Spaces With tr

tr is a command line utility that translates or deletes characters from an input stream and writes the result to an output stream. Furthermore, we can use it to replace multiple spaces with a single space normalizing the spacing within a file.

The command follows a basic syntax:

$ tr -s ' ' < inputfile > outputfile

Let’s break it down:

-s ‘ ‘: represents an option that instructs tr to squeeze repeated characters into a single character; In this case, we’ll use it to squeeze multiple spaces into a single space
< inputfile: redirects the contents of the inputfile as input for the tr command
> outputfile: redirects the output of the tr command to a file named outputfile

Let’s use the above syntax to replace all multi-space occurrences each with a single space in the sample file:

$ tr -s ' ' < adventure.txt > adventure_normalized.txt

Using this command, we read the content of the adventure.txt file and replace any sequences of multiple spaces with a single space. We then save the result into a new file named adventure_normalized.txt.

Normalizing the spaces makes it often easier to search for a pattern using grep.

Furthermore, we can also replace all occurrences of multiple tabs in a file as well:

$ tr -s '\t ' ' ' < adventure.txt > adventure_normalized.txt

The above command replaces all sequences of multiple tabs and spaces with a single space in the adventure.txt file.

4. Matching Words With grep

grep is a command line utility used to search for text using patterns. It scans files or an input stream and prints lines matching a specific pattern. Additionally, grep supports regular expressions making it easier to perform complex searches.

4.1. Basic Search

Now, let’s search for a specific word in the now-normalized text:

$ grep "oak" adventure_normalized.txt
The owl resided in an ancient oak tree, surrounded by a peaceful forest.
Hearing the owl's gentle hoot, he followed the sound until he reached the oak tree.

Here, we search through the adventure_normalized.txt file and print all lines that contain the word oak.

4.2. Case-Insensitive Search

Since grep is case-sensitive by default, we can perform a case-insensitive search using the -i option:

$ grep -i "oak" adventure_normalized.txt

Above, we perform a case-insensitive search for oak in the adventure_normalized.txt file matching words such as oak, Oak, or OAK.

4.3. Searching for Multiple Words

Furthermore, we can search for multiple words in the normalized text. For instance, let’s search for lines containing either the word oak or owl:

$ grep -E 'oak|owl' adventure_normalized.txt
A long time ago, there lived an old owl named lily.
The owl resided in an ancient oak tree, surrounded by a peaceful forest.
Every night, the owl would hoot softly, sharing stories with the moon and stars.
...

In the example above, we search for lines in the adventure_normalized.txt file that contain one of two alternative words. Specifically, the -E option enables extended regular expressions enabling the use of the pipe symbol | as an alternation operator to specify multiple patterns or words.

4.4. Displaying Line Numbers

We can also display the line numbers along with the matching lines using the -n option:

$ grep -n 'wise' adventure_normalized.txt
8:There, under the starry sky, he found guidance and comfort, forever grateful to the wise old owl.

Thus, the above command searches and displays all lines containing the word-wise along with their line numbers.

4.5. Matching Whole Words

To match whole words only, we use the -w option:

$ grep -w 'hoot' adventure_normalized.txt
Every night, the owl would hoot softly, sharing stories with the moon and stars.
Hearing the owl's gentle hoot, he followed the sound until he reached the oak tree.

In the example above, we match hoot as an entire word, not as part of another word.

4.6. Using Regular Expressions

To search for complex patterns, we can use regular expressions. For example, let’s match any line containing the word owl followed by a specific pattern:

$ grep 'owl *resided' adventure_normalized.txt
The owl resided in an ancient oak tree, surrounded by a peaceful forest.

Using the above command, we search for lines in the adventure_normalized.txt file where the word owl is followed by zero or more spaces and then the word resided.

4.7. Counting Occurrences of a Word

Instead of displaying the lines where a specific word appears, we might want to know how many lines contain the matched word. To count the number of matching lines, we use the -c option. This can be useful for analysis or summarizing search results.

To illustrate, let’s count how many lines contain the word oak in the sample file:

$ grep -c 'oak' adventure_normalized.txt
2

In the above command, the output of 2 indicates the word oak matches on two different lines. To clarify, if a word appears multiple times on a single line, we count it as one match.

4.8. Excluding Lines That Contain a Specific Word

Sometimes we may want to exclude lines that contain a specific word or pattern. By using the -v option, we can invert the match showing only lines that don’t contain the matched word or pattern:

$ grep -v 'hoot' adventure_normalized.txt
A long time ago, there lived an old owl named lily.
The owl resided in an ancient oak tree, surrounded by a peaceful forest.
Villagers from nearby would sometimes visit the owl to seek wisdom.
...

In this example, we search through the adventure_normalized.txt file and print out all lines that don’t contain the word hoot.

5. Combining grep and tr

Ultimately, we can combine the grep and tr commands to match words while ignoring multiple spaces effectively:

$ cat adventure.txt | tr -s ' ' | grep "forest"
The owl resided in an ancient oak tree, surrounded by a peaceful forest.
One such night, a young traveler lost his way in the forest.

Here, we use the cat command to read the contents of the adventure.txt file and output them to the standard output. Next, we pipe the output as input to the tr command which replaces each occurrence of multiple spaces with a single space.

Lastly, we pipe the output from tr to the grep command, which searches for lines containing the word forest and prints them.

This way, we can use any of the grep functions discussed earlier to apply a comprehensive filter, which ignores multiple spaces.

6. Conclusion

In the article, we discussed matching words and ignoring multiple spaces using the grep and tr commands.

First, we replaced every multi-space occurrence with a single space in a specific file using tr. After that, we utilized grep and some of its options to match specific words.

Finally, we combined the grep and tr commands for quick processing and searches.

Administration

Scripting

Networking

Files

Processes

Full Archive

About Baeldung