
Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: January 23, 2024
Often, we need to extract specific information from strings. One such scenario is extracting floating-point numbers from a given string. Particularly, this task is useful for data processing and analysis, Web scraping, text mining, parsing data structures, and other reasons.
In this tutorial, we’ll learn about the methods to extract floating-point numbers from a string in the shell using the grep, tr, awk, sed, and pcregrep commands, as well as parameter expansion.
In Linux-based operating systems, the grep utility is used for pattern matching within data streams or text files. However, its functionality isn’t limited to searching.
Given that, we can also use grep to extract specific patterns from complex strings. For instance, we’ll utilize it to extract floating-point numbers from a string in the shell.
In particular, we combine grep with the echo command:
$ echo "Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa" |
grep -Eo '[0-9]+(\.[0-9]+)?'
25.6
60.05
1012.5
Now, let’s discuss the inner workings of the above script in detail:
Specifically, [0-9]+(\.[0-9]+)? is a regex pattern that searches for floating-point numbers:
As a result, the output displays extracted floating-point numbers from the specified string.
Alternatively, we can replace the [0-9] pattern with [[:digit:]] character class to perform the same functionality:
$ echo "Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa" |
grep -Eo '[[:digit:]]+([.][[:digit:]]+)?'
25.6
60.05
1012.5
Consequently, we get the same floating-point numbers from earlier.
tr, short for translate, is another powerful command that is used to replace or delete characters from standard input and write the results to standard output.
However, in this case, we’ll use it for extracting floating-point numbers from a string in the shell:
$ echo "Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa" |
tr -dc '(\.[0-9]+)?\b \n'
25.6 60.05 1012.5
In the above script, we again pipe our string, this time to tr:
Here, (\.[0-9]+)? captures an optional decimal point followed by one or more digits. Importantly, \b is a word boundary that represents the end of a string. Lastly, \n refers to a newline character at the end of the sequence.
As a result, the output displays only extracted floating-point numbers on the terminal.
Likewise, we can replace the [0-9] pattern with [[:digit:]] character class in tr and get the same output:
$ echo "Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa" |
tr -dc '[. [:digit:]]'
25.6 60.05 1012.5
Thus, we get the same extracted floating-point numbers as before, but separated by spaces.
The awk command is used to define operations and patterns to extract and manipulate information from data streams and files. In addition, we can also use it to extract floating-point numbers from a string in the shell.
For this purpose, let’s consider an example:
$ echo "Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa" |
awk '{gsub(/[^0-9.]/, " "); for(i=1;i<=NF;i++) print $i}'
25.6
60.05
1012.5
In the above awk command, the gsub or global substitute replaces any character that isn’t a digit or a dot [^0-9.] with a space, thereby removing the non-numeric characters.
Then, the for loop iterates through each field $i in the space-separated sequence. Furthermore, NF is an awk variable that represents the number of fields in a record.
As a result, the output displays extracted floating-point numbers from the original string.
The sed or stream editor is used to perform basic text transformation on an input stream. Moreover, we can also use it to extract floating-point numbers from a string in the shell.
Now, let’s consider an example in detail:
$ echo "Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa" |
sed -n -E 's/[^0-9]*([0-9]+\.[0-9]+)[^0-9]*/\1 /gp'
25.6 60.05 1012.5
In the above sed command, the s/[^0-9]*([0-9]+\.[0-9]+)[^0-9]*/\1 /gp pattern looks for the sequences that match the pattern of a floating-point number and prints them accordingly.
Furthermore, with sed, the -n option suppresses the automatic printing of the pattern. Meanwhile, -E enables the use of certain metacharacters, such as + for one or more occurrences.
Let’s break down the s/[^0-9]*([0-9]+\.[0-9]+)[^0-9]*/\1 /gp pattern:
Finally, the output displays extracted floating-point numbers from a given string.
The pcregrep tool supports Perl Compatible Regular Expressions (PCRE). Additionally, we can use this command to search for character patterns, so we can extract floating-point numbers from a string in the shell.
First, let’s install pcregrep on our system:
$ sudo apt install pcregrep
Afterwards, we can use pcregrep to extract floating-point numbers from a given string:
$ echo "Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa" |
pcregrep -o '\d+\.\d+'
25.6
60.05
1012.5
In the above command, pcregrep uses \d+\.\d+ PCRE pattern to match floating-point numbers in the string, where \d+ matches one or more digits at the start, \. matches a literal dot or period, and \d+ matches one or more digits at the end.
In addition, the -o option forces pcregrep to only display the matched parts of the string.
Parameter expansion provides a flexible way for extracting the required patterns from a string. With this in mind, we can also use the technique to extract floating-point numbers from a string.
To explain it further, let’s consider an example:
$ string="Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa"
$ for word in $string; do
if [[ $word =~ [0-9]+(\.[0-9]+)? ]]; then
echo "${BASH_REMATCH[0]}"
fi
done
25.6
60.05
1012.5
In the above script:
Specifically, [0-9]+ matches one more digits, and (\.[0-9]+)? represents an optional group that starts with a dot, followed by one or more digits.
As a result, we get equivalent output from earlier.
In this article, we’ve explored several methods to extract floating-point numbers from a string in the shell.
Particularly, we can use grep and pcregrep for quick extraction of floating-point numbers from a string. Meanwhile, tr also offers efficient character-level extraction. Moreover, we can use awk or sed in complex scenarios. Finally, parameter expansion also provides a way to control the extraction of floating-point numbers.