Count Occurrences of Character per Line/Field on Linux

1. Overview

In an earlier tutorial, we’ve seen how we can count occurrences of a character in a file.

In this tutorial, we’ll see how we can count the number of characters in a line or a field. And in the results, we’ll also print the line number along with the character count.

We’ll use some of the most commonly used commands in Linux to achieve this goal.

2. The Problem

Let’s say we’ve got the below file:

$ cat items.txt
ID|Internal|Item|Type
1a|I|RAM|DDR4
2a|I|Processor|Intel
2b|E|Monitor|FHD
3b|E|Monitor|4K
4c|I|HDD|1TB
5c|I|SSD|512GB

We expect this result while counting for character ‘E’:

Line Count
1	0
2	0
3	0
4	1
5	1
6	0
7	0

We’ve got the ‘E’ character in lines 4 and 5. And for those lines, it has printed the occurrences of that character.

Another use case is to search for a character in a particular field. If we wanted to search for ‘E’ in the second field, we’ll get the same result.

3. Using the awk Command

Firstly, let’s check how we can use the awk command to count the characters. As we know, the awk command is used to process text. It runs the AWK language we specify for text processing.

Using the awk command we can achieve this in different ways.

3.1. Using the NR and the NF Variable

One way is to use the built-in AWK variables to count the character in each line. As we know the NR variable counts the number of lines we’ve read. And NF variable counts the number of fields.

Let’s see that in action:

$ awk -F 'E' 'BEGIN{print "Line", "\tCount"}{print NR "\t" NF-1}' items.txt
Line 	Count
1	0
2	0
3	0
4	1
5	1
6	0
7	0

Here, we’ve used the field separator as the search character. Then, the NF variable tracks the number of fields and the NR variable tracks the line number. We print these for each line to get our desired output.

And to print the header once in the beginning, we’ve put them in the BEGIN block.

3.2. Using the gsub Function

Another way is to use the gsub function. This function takes a regex pattern, a replacement string, and an optional target record. And it returns the number of substitutions made. We use this information as the count of search characters.

Let’s see this in action:

$ awk 'BEGIN{print "Line", "\tCount"}{print NR "\t" gsub(/E/,"")}' items.txt
Line 	Count
1	0
2	0
3	0
4	1
5	1
6	0
7	0

Here again, we’ve used the BEGIN block to print the header.

Then we’ve used the gsub function to substitute the character with an empty string. Here our intention is not to substitute the character. But we’re interested only in the return value of the gsub function. The value returned gives us the count of occurrences of the character. Printing this count along with the line number gives us the desired output.

3.3. Count Characters in a Field

We can use the gsub function to count the characters in a particular field. In the gsub function, we’ll pass the field number as the third parameter.

Let’s quickly check that:

$ awk -F'|' 'BEGIN{print "Line", "\tCount"}{print NR "\t" gsub(/E/,"", $2)}' items.txt
Line 	Count
1	0
2	0
3	0
4	1
5	1
6	0
7	0

From the above results, we can see we’ve printed the occurrence of character E in the second field. The command is almost the same. The only difference is we’ve set the field separator as the pipe (|) character using -F option. And in the gsub function, we gave the target field as the field number to search.

4. Using the cat, cut, tr Commands

Next, let’s see how can we can use the cat, cut, and tr commands to count the character.

Let’s get straight into it:

$ cat items.txt | tr -c -d "E\n" | cat -n
  | { echo -e "Line" "\tCount"; while read num data; do printf "%d\t%d\n" $num ${#data}; done; }
Line 	Count
1	0
2	0
3	0
4	1
5	1
6	0
7	0

Here, we’ve used the commands in a pipeline. We used the cat command to read the file content. Then we piped the result to the tr command. There we combined the -c option and -d option to complement the search and not delete the characters we’re searching for. Thus we get the result for each line with the search character. Then using the cat command with -n option we put the line numbers.

Finally, in the last pipeline, we’ve got the code to print the header and count the number of characters in each line.

4.1. Count Characters in a Field

Now, if we need to search for characters in a particular field, we can use the cut command at the start of the pipeline.

Let’s take a look at it:

$ cut -d '|' -f 2 items.txt | tr -c -d "E\n" | cat -n
  | { echo -e "Line" "\tCount"; while read num data; do printf "%d\t%d\n" $num ${#data}; done; }
Line 	Count
1	0
2	0
3	0
4	1
5	1
6	0
7	0

This is the same as the above command. Instead of the cat command at the beginning, we used the cut command to extract only the second field.

Let’s look at the different options used for the cut command:

-d |: to use the delimiter as |
-f 2: to take the second field after splitting

Once we’ve extracted the second field it is passed to the pipeline. The rest of the command is the same as the one above. And we get the result we wanted.

5. Using a Bash Script

The Bash scripts are handy whenever we need to automate our tasks. Let’s use a Bash script to count the characters. We’ve got the script ready in a file count.sh.

Let’s take a look at it:

$ cat count.sh 
line=1
echo -e "Line\tCount"
while read -a array_data
do
    data=${array_data[0]}
    new_data=${data//$1/}
    count=$((${#data}-${#new_data}))
    printf "%d\t%d\n" $((line++)) $count
done < $2
$ ./count.sh E items.txt 
Line	Count
1	0
2	0
3	0
4	1
5	1
6	0
7	0

As we can see, the Bash script accepts two parameters. They are the character to search and the filename.

First, in the script, we read the file line by line using a while loop. The line is stored in the variable array_data. Then we replace the character in the line and store it into another variable new_data. Now we’ve got the original line and line without the search character. With that, we calculate the difference in string length of these variables. This will yield us the number of search characters in the line. Then, we print that along with the line number. Finally, we repeat this for all lines.

From the above results, we can see it has counted the number of characters in each line.

6. Using the grep Command

At times, we wanted to print only those lines having the character we need to find. For those cases, we can use the grep command.

Let’s see this in action:

$ echo "Count  Line"; grep -n -o "E" items.txt | sort -n | uniq -c | cut -d : -f 1
Count  Line
      1 4
      1 5

Here, we’ve used the grep command with the -n option to print the line number. And by using the -o option, it prints only the matching character. Then we sort results from the grep command. And we count the occurrences of the search character using the uniq command with the -c option. Finally, we use the cut command to remove the unwanted string from the final result.

By using the grep command we’re able to print only what is needed. This will be helpful when searching in a big file with plenty of lines.

7. Conclusion

In this tutorial, we’ve seen different ways of counting characters in a line. We may use one of those that fits our needs.

Learn Java Collections

Learn Spring

Learn Maven

View All Courses

Administration

Scripting

Networking

Files

Processes

Full Archive

About Baeldung