1. Overview

Extracting text from the input text is a common operation when we work in the Linux command line.

In this tutorial, we’ll explore some methods for extracting text between two specific characters in the same input line.

2. Introduction to the Problem

An example can help us to understand the problem quickly. Let’s say we have one input line:

text (value) text

If we want to extract the text between ‘(‘ and ‘)‘ characters, “value” is the expected value. We’ll use ‘(‘ and ‘)‘ as the example delimiters in this tutorial. Well, the delimiter characters don’t limit to ‘(‘ and ‘).

Of course, the input line can contain multiple values, for example:

text (value1) text (value2) text (value3) text

In this tutorial, we’ll discuss both cases and explore extracting the target values.

For simplicity, let’s focus on extracting values using various approaches and skip the input validation part. That is to say, we assume that the delimiter characters always appear in pairs in the input line.

3. Extracting One Single Value From the Input

We’ll learn three approaches to extracting the one single value wrapped by a pair of delimiter characters from the input line.

Let’s use the following INPUT variable as the input example:

INPUT="text (value1) text"

3.1. Using the grep Command

grep is good at matching a pattern in the input text. Also, the -o option allows us only to get the matched parts.

Therefore, if the delimiter characters are ‘(‘ and ‘)‘, we can use ‘([^)]*)’ to match the delimiters and the value:

$ grep -o '([^)]*)' <<< $INPUT

If we’d like to get rid of the delimiters and only get the value, we can use the positive lookbehind assertion. As lookaround is one of the PCRE (Perl Compatible Regular Expressions) features, we need to pass the -P option to grep:

$ grep -oP '(?<=[(])[^)]*' <<< $INPUT

The pattern ‘(?<=[(])[^)]*‘ matches any substring enclosed in a pair of parentheses and captures the content inside the parentheses.

3.2. Using the sed Command

sed is a convenient command-line text processing tool. We can get the target value using sed‘s ‘s’ (substitution) command.

If the input line contains one single value, the line should look like this:

.... (value) ....

Then if we replace ‘…. (‘ and ‘) ….‘ with an empty string, then only the value is left there.

So next, let’s see if this idea does the job with our example input:

$ sed 's/.*(//; s/).*//' <<< $INPUT

3.3. Using the awk Command

awk is another powerful command-line text processing utility. We can implement sed‘s substitution in awk to solve the problem:

$ awk '{sub(/.*[(]/,""); sub(/[)].*/,""); print $0}' <<< $INPUT

awk is good at working with field-based inputs. If we set ‘(‘ and ‘)‘ as awk‘s FS (Field Separator), the input line will be separated into three fields:

  F1  |  F2 |  F3

Therefore, if we take the second field, we solve the problem:

$ awk -F'[()]' '{print $2}' <<< $INPUT

4. Extracting Multiple Values From the Input

Now let’s take the following INPUT variable as the input example:

INPUT="text (value1) text (value2) text (value3) text"

The goal is to print value1, value2, and value3 in the output. 

sed cannot solve the problem straightforwardly. So, we’ll see grep and awk approaches in this section

4.1. Using the grep Command

The grep commands that we’ve used for extracting the single value still work for the input containing multiple values:

$ grep -o '([^)]*)' <<< $INPUT

$ grep -oP '(?<=[(])[^)]*' <<< $INPUT

4.2. Using the awk Command

In the “single value” scenario, we’ve seen that the second field is the target value if we set FS='[()]’. Following the same idea, if we have multiple values in the input, the second, fourth, sixth and other even-numbered fields should contain the values we expect:

$ awk -F'[()]' '{for(i=2;i<=NF;i+=2) print $i}' <<< $INPUT

5. Conclusion

In this article, we’ve explored how to extract text between two specific characters from the input. We’ve learned to solve the problem using grepsed, and awk through examples.

Comments are closed on this article!