Compare Strings in Dot-Separated Version Format

1. Overview

An important activity we often perform during version upgrades of a software product is version comparison. More importantly, if the activity is done through a bash script, then comparing strings in dot-separated version format becomes a vital task.

In this article, we’ll discuss algorithms to compare two strings in dot-separated version format to determine which one is the latest version (that is, which version number is greater). We’ll also take a look at other external utilities (part of the Linux marketplace) that can aid in doing the Version comparison.

2. What Is a Version Number?

A version number is a string that is used to identify the unique state of a software product. A version number of the form A.B.C.D, which contains ‘A’, ‘B’, ‘C’, ‘D’ either as numbers or string literals, is a string of numerical or string fields mostly separated by dots. Under semantic versioning, these fields generally represent the hierarchy where the first two fields represent major (A) and minor (B). The third field is the “patch number” (C). The rightmost field (D) is called the revision but it may also be referred to as “Build” or “build number”:

Now, the question comes: “Is there any direct way we can compare strings in dot-separated version format – for example, 2.4.5 and 2.8 and 2.4.5.1 – in Bash? The answer is no, it’s not possible to compare them directly. But, we can compare the inherent fields to find the latest version.

3. Version Comparison Without Using Any External Utilities

Based on the nature of the fields in version strings, we’ll illustrate a few solutions that compare two strings in dot-separated version format in Bash. Under three different scenarios, we’ll focus on solving the problem by using bash built-ins.

3.1. Dot-Separated Sequence of Numeric Fields

We can use printf as a bash built-in to compare version strings containing, at most, four fields:

$ function ver { printf "%03d%03d%03d%03d" $(echo "$1" | tr '.' ' '); }
$ [ $(ver 10.9) -lt $(ver 10.10) ] && echo 1
1

Recursion gives us another approach to comparing version strings. The following algorithm compares version strings containing an equal number of fields by manipulating the string and recursively splitting on ‘.’:

compare_versions() {
     # implementing string manipulation
     local a=${1%%.*} b=${2%%.*}
     [[ "10#${a:-0}" -gt "10#${b:-0}" ]] && return 1
     [[ "10#${a:-0}" -lt "10#${b:-0}" ]] && return 2
     # re-assigning a and b with greatest of 1 and 2 after manipulation
     a=${1:${#a} + 1}
     b=${2:${#b} + 1}
     # terminal condition for recursion
     [[ -z $a && -z $b ]] || compare_versions "$a" "$b"
}

Usage: compare_versions <ver_1> <ver_2>

The function compare_versions returns 2 if version 1 is less than version 2, and 1 if version 1 is greater than version 2.

Now, let’s take a look at a method that compares versions with unequal lengths, such as 3.0002 ‘>’ 3.0003.3:

vercomp() {
    if [[ $1 == $2 ]]
    then
        return 0
    fi
    local IFS=.
    local i ver1=($1) ver2=($2)
    # fill empty fields in ver1 with zeros
    for ((i=${#ver1[@]}; i<${#ver2[@]}; i++))
    do
        ver1[i]=0
    done
    for ((i=0; i<${#ver1[@]}; i++))
    do
        if [[ -z ${ver2[i]} ]]
        then
            # fill empty fields in ver2 with zeros
            ver2[i]=0
        fi
        if ((10#${ver1[i]} > 10#${ver2[i]}))
        then
            return 1
        fi
        if ((10#${ver1[i]} < 10#${ver2[i]}))
        then
            return 2
        fi
    done
    return 0
}

The algorithm uses padding to replace empty fields with zeros and compares each field of the two version strings. The function vercomp returns 2 if version 1 is less than version 2, and 1 if version 1 is greater than version 2.

Here, ‘.’ is the field separator IFS. The Internal Field Separator (IFS) is a special shell variable used for word splitting after expansion:

testvercomp() {
    vercomp $1 $2
    case $? in
        0) op='=';;
        1) op='>';;
        2) op='<';;
    esac
    if [[ $op != $3 ]]
    then
        echo "Fail: Expected '$3', Actual '$op', Arg1 '$1', Arg2 '$2'"
    else
        echo "Pass: '$1 $op $2'"
    fi
}
 
# Run tests
# argument table format:
# testarg1   testarg2     expected_relationship
echo "The following tests should pass"
while read -r test
do
    testvercomp $test
done

$ The following tests should pass
1 1 =
Pass: '1 = 1'
2.1 2.2 <
Pass: '2.1 < 2.2'
3.0.4.10 3.0.4.2 >
Pass: '3.0.4.10 > 3.0.4.2'
3.2 3.2.1.9.8144 <
Pass: '3.2 < 3.2.1.9.8144'

The function testvercomp compares the output from vercomp with the input and returns “Pass” or “Fail”.

3.2. Last Field Optionally Ending With Letters

So far, we’ve discussed methods to compare version strings containing dot-separated sequences of numeric fields. However, let’s discuss a special case where the last field of a version optionally ends with letters. Essentially, we’ll now see a method that can compare 2.5 with 2.5a:

V() 
{ 
    local a=$1 op=$2 b=$3 al=${1##*.} bl=${3##*.}
    # Left-trim digits from the tail items so only letters are left
    while [[ $al =~ ^[[:digit:]] ]]; do
        al=${al:1};
    done
    while [[ $bl =~ ^[[:digit:]] ]]; do
        bl=${bl:1}
    done
    # Right trim letters from a and b to leave just the sequence of numeric items
    local ai=${a%$al} bi=${b%$bl}
    local ap=${ai//[[:digit:]]} bp=${bi//[[:digit:]]}
    # zero right-paddings
    ap=${ap//./.0} bp=${bp//./.0}
    local w=1 fmt=$a.$b x IFS=.
    for x in $fmt; do
        [ ${#x} -gt $w ] && w=${#x};
    done
    fmt=${*//[^.]} fmt=${fmt//./%${w}s}
    printf -v a $fmt $ai$bp
    printf -v a "%s-%${w}s" $a $al
    printf -v b $fmt $bi$ap
    printf -v b "%s-%${w}s" $b $bl
    case $op in
        '<='|'>=' ) [ "$a" ${op:0:1} "$b" ] || [ "$a" = "$b" ] ;;
                * ) [ "$a" $op "$b" ] ;;
    esac
}
P() { printf "$@"; }
EXPECT() { printf "$@"; }
CODE() { awk $BASH_LINENO'==NR{print " "$2,$3,$4}' "$0"; }

Let’s take a look at a sample output. Note: ++ (true) and __ (false):

$V 3.5 '>' 3.5b && P + || P _; EXPECT _; CODE
__ 3.5 '>' 3.5b 
$V 3.0 '<' 3.0.3 && P + || P _; EXPECT +; CODE
++ 3.0 '<' 3.0.3 
$V 3.0002 '>' 3.0003.3 && P + || P _; EXPECT _; CODE
__ 3.0002 '>' 3.0003.3 
$V 3.0003 '>' 3.0000004 && P + || P _; EXPECT _; CODE
__ 3.0003 '>' 3.0000004

The algorithm in this case also compares two versions consisting of an unrestricted number of digits of any field, such as 3.0003 > 3.0000004, and an unrestricted number of fields. Unlike the previous case, in this case, zeros are automatically inserted to compare the same number of fields: 1.0 < 1.0.1 means 1.0.0 < 1.0.1.

3.3. Tuple-Based, Dot-Separated Version Numbers

We’ll see how, in a few special cases, version numbers may also contain alpha symbols. Some examples of this type include version numbers like 10.c.3, 4.0-RC1, and 4.0-RC2.

Let’s see how we can compare these tuple-based, dot-separated version strings lexicographically in Bash using ASCII ordering:

compare-versions()
{
    if [[ $1 == $2 ]]; then
        return 0
    fi
    local IFS=.
    # Everything after the first character not in [^0-9.] is compared
    local i a=(${1%%[^0-9.]*}) b=(${2%%[^0-9.]*})
    local arem=${1#${1%%[^0-9.]*}} brem=${2#${2%%[^0-9.]*}}
    for ((i=0; i<${#a[@]} || i<${#b[@]}; i++)); do
        if ((10#${a[i]:-0} < 10#${b[i]:-0})); then
            return 2
        elif ((10#${a[i]:-0} > 10#${b[i]:-0})); then
            return 1
        fi
    done
    if [ "$arem" '<' "$brem" ]; then
        return 2
    elif [ "$arem" '>' "$brem" ]; then
        return 1
    fi
    return 0
}

The function testvercomp (Section 3.1) compares the output from compare-versions with the input and returns “Pass” or “Fail”.

$ The following tests should pass
1.0rc1 1.0rc2 <
Pass: '1.0rc1 < 1.0rc2'

4. Version Comparison Using External Utilities

Next, let’s discuss a few commands or utilities that aren’t built into the shell. They give us a rather simple approach to comparing dot-separated version strings in Bash.

4.1. Using GNU sort of Coreutils-7

If we have coreutils-7, we can use the sort command with the -V option (–version-sort) to do the comparison:

$ printf '2.4.5\n2.8\n2.4.5.1\n' | sort -V
2.4.5
2.4.5.1
2.8

With GNU sort -C or –check=silent, we can write:

$ verlte() { 
>     printf '%s\n%s' "$1" "$2" | sort -C -V
> }
$ verlte 2.5.7 2.5.6 && echo "yes" || echo "no"
no

Another approach that compares two strings in dot-separated version format – “1.2”, “2.3.4”, “1.0”, and “1.10.1” – expects a max of three version fields. The maximum number of fields has to be known in advance:

$ expr $(printf "1.10.1\n1.7" | sort -t '.' -k 1,1 -k 2,2 -k 3,3 -g | sed -n 2p) != "1.7"
1

This piece of bash code returns 1 since 1.10.1 is bigger than 1.7.

Well, if we know the number of fields, we can use -k n,n to devise yet another super-simple solution:

$ printf '2.4.5\n2.8\n2.4.5.1\n2.10.2\n' | sort -t '.' -k 1,1 -k 2,2 -k 3,3 -k 4,4 -g
2.4.5
2.4.5.1
2.8
2.10.2

4.2. Using dpkg –compare-versions

Essentially, we can use dpkg to compare two strings in dot-separated version format in bash.

Usage: dpkg --compare-versions <condition>

If the condition is true, the status code returned by dpkg will be zero (indicating success). So, we can use this command in an ‘if’ statement to compare two version numbers:

$ if $(dpkg --compare-versions "2.11" "lt" "3"); then echo true; else echo false; fi
true

5. Normalization Technique to Compare Versions

Of the algorithms we have discussed so far, a few have inadvertently used a conversion technique for obtaining a set of numbers that can easily be compared. These resulting numbers are ostensibly termed Normalized version numbers. With the goal to compare dot-separated versions, we’ll devise a solution that normalizes and then compares them.

Let’s first convert octal numbers in version strings to decimal. For example:

1.08 → 1 8, 1.0030 → 1 30, 2021-02-03 → 2021 2 3…

The conversion code is:

v() { 
printf "%04d%04d%04d%04d%04d" $(for i in ${1//[^0-9]/ }; do printf "%d" $((10#$i)); done) 
}

Then, we compare them as:

while read -r test; do
    set -- $test
    printf "$test "
    eval "if [[ $(v $1) $3 $(v $2) ]] ; then
              echo true
          else
              echo false
          fi"
done

Let’s take a look at a sample output:

$  1.08 1.0030 <
1.08 1.0030 < true

6. Which One to Use?

We know that the shell built-in commands execute fast. So, our decision of method selection should strongly consider whether or not the algorithm only uses these commands. However, in general, these methods use a rather complex approach.

If simplicity is the ultimate motive, then Linux has a few handy utilities in its repertoire that can unbelievably reduce the effort required. sort and dpkg are a few of those that we discussed.

Still, what types of string version numbers we want to compare should solely govern these criteria.

7. Conclusion

In this article, we discussed how to compare two strings in dot-separated version format in Bash.

We focused on discussing methods to compare versions by just using bash built-ins in the initial part of the article. However, later, we discussed commands and utilities that directly do the ordering so that we can maneuver the result to do the necessary comparison for us.

Finally, we learned about normalized version numbers and the associated algorithms that evaluate and compare them.

Administration

Filesystems

Processes

Files

Scripting

Installation

Networking

Security

Full Archive

About Baeldung