1. Overview

Often, we need to extract specific information from strings. One such scenario is extracting floating-point numbers from a given string. Particularly, this task is useful for data processing and analysis, Web scraping, text mining, parsing data structures, and other reasons.

In this tutorial, we’ll learn about the methods to extract floating-point numbers from a string in the shell using the grep, tr, awk, sed, and pcregrep commands, as well as parameter expansion.

2. Using the grep Command

In Linux-based operating systems, the grep utility is used for pattern matching within data streams or text files. However, its functionality isn’t limited to searching.

Given that, we can also use grep to extract specific patterns from complex strings. For instance, we’ll utilize it to extract floating-point numbers from a string in the shell.

In particular, we combine grep with the echo command:

$ echo "Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa" |
  grep -Eo '[0-9]+(\.[0-9]+)?'
25.6
60.05
1012.5

Now, let’s discuss the inner workings of the above script in detail:

  • pipe (|) redirects the output of echo as an input to grep
  • grep extracts floating-point numbers from a specified string using the [0-9]+(\.[0-9]+)? pattern
  • with grep, the -E option enables extended regular expressions, and -o only displays the matched parts of the string

Specifically, [0-9]+(\.[0-9]+)? is a regex pattern that searches for floating-point numbers:

  • [0-9]+ matches one or more digits (0-9)
  • ( represents the opening of a group
  • \. matches a literal period
  • [0-9]+ matches one or more digits after the decimal point
  • ) represents the ending of a group
  • ? allows for zero or one occurrences

As a result, the output displays extracted floating-point numbers from the specified string.

Alternatively, we can replace the [0-9] pattern with [[:digit:]] character class to perform the same functionality:

$ echo "Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa" |
  grep -Eo '[[:digit:]]+([.][[:digit:]]+)?'
25.6
60.05
1012.5

Consequently, we get the same floating-point numbers from earlier.

3. Using the tr Command

tr, short for translate, is another powerful command that is used to replace or delete characters from standard input and write the results to standard output.

However, in this case, we’ll use it for extracting floating-point numbers from a string in the shell:

$ echo "Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa" |
  tr -dc '(\.[0-9]+)?\b \n'
25.6 60.05 1012.5

In the above script, we again pipe our string, this time to tr:

  • tr extracts floating-point numbers using the (\.[0-9]+)?\b \n pattern
  • -c option in combination with -d tells tr to delete any characters that don’t match with the specified pattern

Here, (\.[0-9]+)? captures an optional decimal point followed by one or more digits. Importantly, \b is a word boundary that represents the end of a string. Lastly, \n refers to a newline character at the end of the sequence.

As a result, the output displays only extracted floating-point numbers on the terminal.

Likewise, we can replace the [0-9] pattern with [[:digit:]] character class in tr and get the same output:

$ echo "Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa" |
  tr -dc '[. [:digit:]]'
25.6 60.05 1012.5

Thus, we get the same extracted floating-point numbers as before, but separated by spaces.

4. Using the awk Command

The awk command is used to define operations and patterns to extract and manipulate information from data streams and files. In addition, we can also use it to extract floating-point numbers from a string in the shell.

For this purpose, let’s consider an example:

$ echo "Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa" |
  awk '{gsub(/[^0-9.]/, " "); for(i=1;i<=NF;i++) print $i}'
25.6
60.05
1012.5

In the above awk command, the gsub or global substitute replaces any character that isn’t a digit or a dot [^0-9.] with a space, thereby removing the non-numeric characters.

Then, the for loop iterates through each field $i in the space-separated sequence. Furthermore, NF is an awk variable that represents the number of fields in a record.

As a result, the output displays extracted floating-point numbers from the original string.

5. Using the sed Command

The sed or stream editor is used to perform basic text transformation on an input stream. Moreover, we can also use it to extract floating-point numbers from a string in the shell.

Now, let’s consider an example in detail:

$ echo "Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa" |
  sed -n -E 's/[^0-9]*([0-9]+\.[0-9]+)[^0-9]*/\1 /gp'
25.6 60.05 1012.5

In the above sed command, the s/[^0-9]*([0-9]+\.[0-9]+)[^0-9]*/\1 /gp pattern looks for the sequences that match the pattern of a floating-point number and prints them accordingly.

Furthermore, with sed, the -n option suppresses the automatic printing of the pattern. Meanwhile,  -E enables the use of certain metacharacters, such as + for one or more occurrences.

Let’s break down the s/[^0-9]*([0-9]+\.[0-9]+)[^0-9]*/\1 /gp pattern:

  • s/ represents the starting delimiter
  • [^0-9]* matches any non-digit characters (zero or more occurrences) at the start
  • ([0-9]+\.[0-9]+) extracts floating-point numbers based on one or more digits, a dot, and one or more digits
  • [^0-9]* matches any non-digit characters (zero or more occurrences) at the end
  • \1 refers to the first match group, i.e., extracted floating-point number
  • in /gp, g or global flag replaces all occurrences on a line and p prints the replaced result

Finally, the output displays extracted floating-point numbers from a given string.

6. Using the pcregrep Command

The pcregrep tool supports Perl Compatible Regular Expressions (PCRE). Additionally, we can use this command to search for character patterns, so we can extract floating-point numbers from a string in the shell.

First, let’s install pcregrep on our system:

$ sudo apt install pcregrep

Afterwards, we can use pcregrep to extract floating-point numbers from a given string:

$ echo "Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa" |
  pcregrep -o '\d+\.\d+'
25.6
60.05
1012.5

In the above command, pcregrep uses \d+\.\d+ PCRE pattern to match floating-point numbers in the string, where \d+ matches one or more digits at the start, \. matches a literal dot or period, and \d+ matches one or more digits at the end.

In addition, the -o option forces pcregrep to only display the matched parts of the string.

7. Using Parameter Expansion

Parameter expansion provides a flexible way for extracting the required patterns from a string. With this in mind, we can also use the technique to extract floating-point numbers from a string.

To explain it further, let’s consider an example:

$ string="Temperature: 25.6C, Humidity: 60.05%, Pressure: 1012.5 hPa"
$ for word in $string; do
    if [[ $word =~ [0-9]+(\.[0-9]+)? ]]; then
      echo "${BASH_REMATCH[0]}"
    fi
  done
25.6
60.05
1012.5

In the above script:

  • for loop iterates over each word in a given string
  • if [[ $word =~ [0-9]+(\.[0-9]+)? ]]; then checks if the current word matches the specified pattern
  • echo outputs the matched pattern stored in the BASH_REMATCH array

Specifically, [0-9]+ matches one more digits, and (\.[0-9]+)? represents an optional group that starts with a dot, followed by one or more digits.

As a result, we get equivalent output from earlier.

8. Conclusion

In this article, we’ve explored several methods to extract floating-point numbers from a string in the shell.

Particularly, we can use grep and pcregrep for quick extraction of floating-point numbers from a string. Meanwhile, tr also offers efficient character-level extraction. Moreover, we can use awk or sed in complex scenarios. Finally, parameter expansion also provides a way to control the extraction of floating-point numbers.

Comments are closed on this article!