Finding Lines With Exactly One Instance of a Specific Character in a File

1. Introduction

Searching and pattern-matching hold significant importance in text processing. When it comes to processing text in Bash, the grep command usually plays a key role.

In this tutorial, we’ll explore the process of using grep to find file lines with exactly one instance of a specific character.

First, we’ll explore the grep command with regular expressions to get the desired results. Subsequently, we’ll discuss the utilization of various options with grep. Lastly, we’ll go through additional Bash commands that we can combine with grep to achieve a similar outcome.

2. Sample File

Let’s use the cat command to take a look at the sample file we’ll work on:

$ cat patternFile.txt
xxx#aaa#iiiii
xxxxxxxxx#aaa
#xxx#bbb#111#yy
xxxxxxxxxxxxxxxx#
xxx#x
#x#v#e#
#x#

Here, we have a file that includes a distinct mix of integers, alphabet characters, and symbols.

3. Using Bash

One approach we can use for our purposes is the parameter expansion feature of Bash.

Let’s suppose we want to search for lines that have exactly one #:

$ cat searchPattern.sh
#!/bin/bash
while IFS= read -r line; do
    hashes=${line//[^#]}
    if [ ${#hashes} -eq 1 ]; then
        echo "Pattern with exactly one #: $line"
    fi
done < patternFile.txt

In the above script, we supply the input via a file using input redirection. In addition, we use the while loop that reads each line from the file and assigns it to line. Next, we set IFS to an empty string to preserve whitespaces in the input.

Notably, we use the // parameter expansion along with Bash regex to remove all characters from the line except for the # character and store the result in the variable hashes. Then, we use the if statement to check if the length of hashes is equal to 1, indicating that there’s exactly one # in the line.

Lastly, we print the pattern if the condition is true.

Let’s make the script executable via the chmod command and run it:

$ chmod +x patternSearch.sh
$ ./patternSearch.sh
Pattern with exactly one #: xxxxxxxxx#aaa
Pattern with exactly one #: xxxxxxxxxxxxxxxx#
Pattern with exactly one #: xxx#x

To conclude, the script repeats this process for each line in the input file, identifying and printing lines that contain exactly one #.

4. Using grep With Regular Expressions

To begin with, let’s see the behavior of grep with different regular expression types.

4.1. Using Basic Regular Expressions (BRE)

A Basic Regular Expression (BRE) is a type of regular expression commonly used in Linux utilities since it’s one of the least feature-reach and thus usually easier to implement. By default, grep uses BRE.

Now, let’s search the file for lines containing exactly one occurrence of the # character:

$ grep '^[^#]*#[^#]*$' patternFile.txt
xxxxxxxxx#aaa
xxxxxxxxxxxxxxxx#
xxx#x

In the above line of code, we use the grep command by passing it a pattern to return all lines that contain precisely one #. In particular, we constructed the [^#]#[^#]$ regular expression:

^ matches the beginning of a line
[^#]* matches any character except for # as long as possible
# matches the # character literally
$ matches the end of a line

So, we use ^[^#]*#[^#]*$ to match lines in patternFile.txt that contain #, and have no # before or after it on the same line.

4.2. Using Extended Regular Expressions (ERE)

Extended Regular Expressions (ERE) include a broader set of metacharacters with more functions, such as +, ?, (), and |. The -E option in the grep command enables extended regular expressions.

For example, let’s search for a pattern that returns lines from patternFile.txt containing exactly one occurrence of x:

$ grep -E '^[^x]*x{1}[^x]*$' patternFile.txt
#x#v#e#
#x#

The main difference between the functionality of BRE and ERE is the support of additional characters useful for matches. To elaborate, BRE is more basic, using fewer metacharacters with reduced functions. In contrast, ERE allows for a more extensive set of metacharacters. ERE also supports additional features like more complex character classes and quantifiers.

5. Using grep With awk

Since the awk command is often used for manipulating text data, we’ll use it in combination with grep to filter the lines in a file that contain a specified pattern.

For example, to find the lines containing a single # in patternFile.txt, we’ll pipe the grep output to the awk command:

$ grep '#' patternFile.txt | awk -F'#' 'NF == 2'
xxxxxxxxx#aaa
xxxxxxxxxxxxxxxx#
xxx#x

In the above example, we use the grep command to search and match the lines that contain # characters irrespective of the number of their occurrences. After that, we pipe the output of grep to awk.

Next, we use the -F option with awk to specify the delimiter. Thus, the awk interpreter treats each line as a series of fields separated by #.

Notably, we add the awk condition NF == 2 which checks if the number of fields in each line is equal to 2. In other words, we check if a line has exactly two fields separated by a single #.

This effectively selects lines that contain only one # character.

6. Using grep With sed

An alternative method to find lines with exactly one instance of a character is to use the sed command.

The sed processor is a powerful tool for performing various operations on text data, such as searching and processing text line by line.

We’ll search for the occurrence of the same pattern we saw earlier, but this time using sed:

$ grep '#' patternFile.txt | sed -n '/^[^#]*#[^#]*$/p'
xxxxxxxxx#aaa
xxxxxxxxxxxxxxxx#
xxx#x

Similar to awk, we pipe the output of grep to sed. The output from grep consists of all lines having the # symbol.

We use the -n option with sed to suppress the default output behavior, which is to print each line. After that, we process the /^[^#]*#[^#]*$/p pattern:

^ matches the beginning of a line
^[^#]* matches any sequence of characters that doesn’t contain a #
# matches the # character literally
[^#]* again matches any sequence of characters that doesn’t contain a #
^ matches the end of a line

Lastly, we use /p to print all lines that match the pattern.

7. Using grep With perl

An alternative way to find lines with exactly one instance of a specific character is to use grep and the perl interpreter.

Perl is a general-purpose language for text processing and manipulation.

Using perl and grep, we can create a pipeline to search for lines with exactly one occurrence of a particular character:

$ grep '#' patternFile.txt | perl -ne 'print if tr/#/#/ == 1'
xxxxxxxxx#aaa
xxxxxxxxxxxxxxxx#
xxx#x

Similar to our earlier example, we use grep to search for lines containing # in patternFile.txt. After that, we pipe the output to perl.

We add two options to the perl command. Firstly, the -e option specifies a custom script that should be executed for each line of input.

Secondly, the -n option instructs the Perl interpreter to process the input provided to it line by line.

In addition, the Perl script ‘print if tr/#/#/ == 1’ prints the line containing exactly one #:

print alone outputs the current line if the condition that follows is met
if tr/#/#/ == 1 counts the number of # using transliteration operation and compares it to 1

To conclude, the perl script counts all # symbols and compares the final count with 1. Notably, the value should be exactly one to print those lines having exactly a single occurrence of that symbol.

8. Conclusion

In this article, we learned how to get lines that contain one instance of a specific character using various commands with grep.

First, we looked at the grep command with BRE, ERE, and PCRE. After that, we saw how to match a single occurrence of a character using the awk command. Next, we discussed piping grep to the sed command to achieve a similar result.

Lastly, we learned how to use the grep command pipelined with perl to search for specified patterns in the file and filter the lines that contain the desired output.

Full Archive

About Baeldung

Administration

Filesystems

Processes

Files

Scripting

Installation

Networking

Security