1. Introduction

Text processing is the manipulation and transformation of textual data. It plays an important role in parsing, extraction, and formatting.

In this tutorial, we’ll discuss several ways to remove the lines from a file that don’t match a given pattern. In other words, only lines that match the provided pattern should be left in the output.

First, we’ll discuss basic pattern matches using sed. Next, we’ll discuss the grep command and see how we can omit non-matching lines. After that, we’ll explore the use of the awk command to only leave in the lines that match a given pattern. Lastly, we’ll discuss the Bash scripting approach to display the matched instances in the output.

2. Sample Data

Let’s use the cat command to take a look at the sample file we’ll work on:

$ cat exampleFile.txt
xxx#aaa#iiiii
8xix7#xx
x#@@ii#
xxxxxxxxx#aaa
#xxx#bbb#111#yy
x#xiix6x#9@
xxxxxxxxxxxxxxxx#
xxx#x#xiia#xa
#x#v#e#
#x#xi@xxi#
123456789

Here, we have a file that contains a unique combination of numbers, letters, and symbols.

In most approaches we discuss, either specific flags or redirection can write our output back to the same or a new file, if needed.

3. Using the sed Editor

There are multiple ways to remove lines using the sed editor. In this section, we’ll match single and multiple patterns using sed and delete lines that don’t match.

In all cases, we can use the -i switch to apply our changes in place when desired.

3.1. Matching a Single Pattern

To begin with, we’ll discuss how the sed editor can be used to exclude lines that don’t match a specific pattern.

For instance, we’ll use sed to delete lines that don’t match the pattern #aaa#:

$ sed '/#aaa#/!d' exampleFile.txt
xxx#aaa#iiiii

In the above code, we use sed to filter and process text. In particular, the code reads and processes the contents of the file exampleFile.txt, printing each non-deleted line by default.

The sed expression /#aaa#/!d specifies what should be done with each line in the input file:

  • #aaa# is the pattern we want to match
  • ! ensures we apply any actions only to non-matching lines
  • d deletes lines according to the conditions above

So, when we run this command, sed reads each line from exampleFile.txt, and if a line doesn’t contain the pattern #aaa#, it’s deleted from the output.

However, lines that do contain this pattern are retained in the output. In other words, this is similar to the more simple case:

$ sed -n '/#aaa#/p' exampleFile.txt
xxx#aaa#iiiii

Here, we use -n to suppress the output of all lines. After that, we only use p to print lines that match #aaa#.

3.2. Matching Multiple Patterns

Additionally, we can use the sed editor to find multiple patterns in a file and exclude lines based on that.

For this example, we use two different patterns to find and omit the lines that don’t meet the required criteria:

$ sed -n '/^[A-Za-z]/p' exampleFile.txt | sed -n '/[a-z]$/p'
xxx#aaa#iiiii
xxxxxxxxx#aaa
xxx#x#xiia#xa

In the above command, we create a pipeline using the sed editor.

The first part searches for the lines that start with an uppercase or lowercase letter and only prints them, removing non-matches. After that, the output of the first command is piped to the second command as an input.

Next, the second part searches for the lines where the last character is a lowercase letter (a-z) and only prints them, deleting the rest. Lastly, the code prints the output on the terminal.

Hence, by combining the two patterns and using -n with p, we effectively filter the lines. Therefore, the resulting output consists of lines that match both patterns, effectively leaving only lines that start with any alphabetic character and end with a lowercase letter.

4. Using the grep Command

An alternative way to find lines with specific patterns and exclude the rest is to use the grep command.

For example, to remove lines that don’t end with #, we’ll use the $ symbol:

$ grep "#$" exampleFile.txt
x#@@ii#
xxxxxxxxxxxxxxxx#
#x#v#e#
#x#xi@xxi#

Here, we don’t print lines that don’t match the specified pattern.

Of course, we can use double negation as well:

$ $ grep -v "[^#]$" exampleFile.txt
x#@@ii#
xxxxxxxxxxxxxxxx#
#x#v#e#
#x#xi@xxi#

In this code, we use the grep command with the v option to invert the matching pattern. Notably, this is a less readable way of achieving the same result.

5. Using the awk Command

In the same way, we can use the awk command to exclude the lines that don’t match a given pattern.

For instance, let’s leave in only lines that contain the #aaa# pattern by using the awk command:

$ awk '/#aaa#/' exampleFile.txt
xxx#aaa#iiiii

Notably, this syntax is almost identical to that of grep, apart from the forward slashes.

So, when we put it all together, awk ‘/#aaa#/’ omits lines that don’t contain the pattern #aaa#. Furthermore, we can write the changes back to the original file.

6. Using Bash Script

Alternatively, shell scripts and constructs are also useful to remove lines that match a specific pattern.

Let’s suppose we want to exclude the lines that contain the i character:

$ cat pattern.sh
#!/bin/bash
file="exampleFile.txt"
while IFS= read -r line; do
  if [[ $line == *#aaa#* ]]; then
    echo "$line"
  fi
done < "$file"

In the above script, we use input redirection to provide input from a file. Furthermore, we use a while loop with read to get each line from the file and assign it to the variable line. To preserve whitespace in the input, we set IFS to an empty string.

Notably, we use if with a Bash regex to match lines against a wildcard pattern with #aaa#.

Let’s see the result of running our script:

$ bash pattern.sh
xxx#aaa#iiiii

In summary, this script searches for lines that contain #aaa# and only prints them, effectively removing lines that don’t match our pattern of choice.

7. Conclusion

In this article, we learned how to remove lines that don’t match a given pattern.

First, we looked at the sed editor to omit non-matching lines. Next, we discussed the grep command to filter and exclude lines based on our pattern. After that, we saw how to use the awk command to exclude the lines that don’t match a given pattern.

Lastly, we discussed the Bash scripting approach to search for all and only leave all lines with instances of a given string in the sample file.