Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: July 6, 2024
When using a Linux-based command line interface, we often need to parse strings based on separators for post-processing logs. The separators can be ‘-‘ in a string representation of date or ‘/’ in file paths. Other examples of separators are ‘:‘, ‘|‘, and ‘@‘.
In this tutorial, we’ll discuss various techniques to split a string and extract various parts of the string. We’ll specifically use the date command to generate example input strings and use bash, sed, awk, cut, and grep to extract the interesting parts of the string.
To begin with, let’s explore the date command:
$ date "+%Y-%m-%d"
2023-12-07
$ date "+%Y/%m/%d
2023/12/07
$ date "+%Y-%m-%d_%H:%M:%S"
2023-12-07_22:07:14
Here, we use the date command with different options to print the date and time in a specified format. The example specifications are:
We’ll use the above command to generate the string and then extract various parts of the string.
We’ll extract the day of the month from today’s date using the output of the date command.
We split the string using the cut command:
$ date "+%Y-%m-%d" | cut -f3 -d '-'
07
As shown above, the cut command has a couple of options:
In summary, we used the cut command, to parse and extract the last part of the string.
We’ll extract the interesting part of the string using the awk command:
$ date "+%Y-%m-%d" | awk -F- '{print $NF}'
07
Here, the awk command has a couple of options:
To emphasize, we piped the output of the date command to the awk script. The awk script prints the last field.
We use a regular expression to extract the useful part of the string using the sed command:
$ date "+%Y_%m_%d" | sed 's/.*_//'
07
In the above example, the sed command uses the regular expression s/.*_// to blank out everything other than the suffix of the string after the ‘_’ separator.
The key idea is the use of a regular expression, to extract the appropriate part of the string.
We use native parameter expansion support in Bash:
$ DATE=$(date "+%Y-%m-%d")
$ echo $DATE
2023-12-07
$ printf "%s\n" "${DATE##*-}"
07
Here, we use the printf command with the following options:
In summary, we used the parameter expansion feature in Bash to extract the part of the string.
Let’s use a Python script to split a string:
$ DATE=$(date "+%Y-%m-%d")
$ echo $DATE
2023-12-07
$ cat split.py
#!/usr/bin/env python
import sys
print(sys.argv[1].split("-")[-1])
$ split.py $DATE
07
As illustrated above, the Python script does the following:
Python, being a general-purpose scripting language, is more flexible for string processing.
We use a native substring operator in Bash to extract the day of the month:
$ DATE=$(date "+%Y-%m-%d")
$ echo $DATE
2023-12-07
First, we find the index of the last hyphen ‘-‘ character:
$ date "+%Y-%m-%d" |
grep -ob "-"
4:-
7:-
Next, we extract the last digit, from the output of the grep command:
$ date "+%Y-%m-%d" |
grep -ob "-" |
grep -oE "[0-9]+" |
tail -1
7
In the above example, the sequence of the commands uses the following options:
Then, we calculate the index of the character after the last hyphen and extract the suffix:
$ DATE=$(date "+%Y-%m-%d")
idx=$(date "+%Y-%m-%d" | grep -ob "-" | grep -oE "[0-9]+" | tail -1)
$ let "idx = $idx + 1"
$ printf %s "${DATE:$idx}"
As we see, the let command calculates the index of the character after the hyphen. Finally, we print the suffix of the string using substring expansion in Bash.
In summary, we used a sequence of commands in Bash to extract the suffix of a string.
In this article, we discussed multiple ways to extract the last part of a string, after a hyphen. To begin with, we used the Bash, awk, sed, and grep commands to split and extract the parts of the string. Later we also used a general-purpose scripting language i.e. Python.
The examples shown above can be modified to use any separator, apart from a hyphen.