Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: December 25, 2024
Filtering specific data from text files or command output is often essential in Linux. To that end, locating numeric values manually can be labor-intensive, but the grep command offers an efficient way to address this issue, especially when searching for error codes or performance metrics.
In this tutorial, we’ll learn how to use grep to select only numeric values in Linux.
Before moving forward, let’s ensure we have a sample dataset to demonstrate different approaches for selecting only numeric values:
$ cat sample.txt
"The quick 2 brown 34 fox 54 jumps over 42 a lazy dog."
Sixty 45 zippers 76 were 6 quickly 5 picked from the woven jute bag.
"Brown jars 23" prevented the mixture 34 from 123 freezing too quickly.
In this basic dataset, we see three lines with different amounts of words, each also containing numbers and digits separate from other text.
We save this sample dataset in a file named sample.txt. This helps us follow and test the commands in the next sections. For simplicity, let’s assume we only consider whole numbers, not floating point values.
grep offers a wide range of options for pattern matching, including the ability to search for numeric values. Further, it fully supports both basic and extended regular expressions.
We can use grep to select only the matching part via the –only-matching (-o) option:
$ grep -o '[0-9]\+' sample.txt
2
34
54
42
45
76
6
5
23
34
123
This command selects only the numeric values from the input file, i.e., sample.txt.
Let’s break down the options used in this grep command:
In a basic regular expression, each character is treated as a literal character; thus, we need to tell grep every time we use a quantifier such as + by escaping them with a backslash (\). We can get each separate digit on a new line by removing the \+ from the basic regular expression used in the above command.
Alternatively, we can use extended regular expressions (-E) without escaping the + character:
$ grep -oE '[0-9]+' sample.txt
In this command, we achieved the same result with the -E option that enables the extended regular expression syntax.
Furthermore, we can get line numbers of the matches by using the -n option:
$ grep -oEn '[0-9]+' sample.txt
1:2
1:34
1:54
1:42
2:45
2:76
2:6
2:5
3:23
3:34
3:123
This command displays the line numbers of each match, in addition to the matched part.
Furthermore, we can print all numeric values together without any separator using the –null-data (-z) option:
$ grep -Eoz '[0-9]+' sample.txt
23454424576652334123
With the -z option, each match is NULL-terminated rather than using newline-termination.
We can use the awk command alongside grep to group all the matches by their respective line numbers:
$ grep -oEn '[0-9]+' sample.txt | awk -F: '{line[$1] = (line[$1] ? line[$1]" "$2 : $2)} END {for (key in line) print key":", line[key]}'
1: 2 34 54 42
2: 45 76 6 5
3: 23 34 123
This command collects all the matches from each line and joins them into a single output line, making it easier to identify numbers from different lines.
First, the grep command outputs only numeric values with their corresponding line numbers. Then, the pipe (|) takes the output of the preceding command (grep) and passes it as input to the following command, i.e., awk.
Let’s take a closer look at the awk part of this command:
Additionally, we can use different separators while creating the associate array such as a tab (\t) or a hyphen (–). For example, (line[$1] ? line[$1]”\t”$2 : $2) adds a tab between numeric values from the same line.
In this article, we learned how to use grep for selecting only numeric values.
Firstly, we created a dataset and used grep with both basic and extended regular expressions to select only numeric values from the dataset. Then, we explored the -n (–line-number) option to print the line numbers of the matching patterns. Next, we displayed all the numeric values together without any separator.
Lastly, we used awk along with the grep command for grouping the numeric values by their corresponding line numbers.