String Manipulation in Bash | Baeldung on Linux

1. Overview

Bash is a sh-compatible shell and command processor and string manipulation is one of the most common tasks to be done in a shell environment.

In this tutorial, we’ll learn how to operate on strings using Bash.

2. String Variable Declaration and Assignment

Bash doesn’t have a type system, all variables are strings. However, variables can have attributes that change or constrain their behavior that can be beneficial even when dealing with strings.

2.1. Declaration

A simple declaration and value assignment looks like this:

$ VAR1='Hello World'
$ VAR2=Hello

Mind that there are no spaces before or after the equals sign. If we want to assign attributes to a variable, we can use declare command. For example, the -r flag will make it read-only:

$ declare -r VAR1='Hello world'

Now if we try to assign some other value to that variable we will get an error:

$ VAR1='Good morning Vietnam'
-bash: VAR1: readonly variable

2.2. Reading

We can ask the user for input using the read command:

$ read -p 'Type your name and press enter: ' NAME
Type your name and press enter: Baeldung
$ echo "Hello $NAME"
Hello Baeldung

The -p flag allows us to specify prompt text without typing additional echo commands. The last parameter of the command is the name of a variable. If we don’t specify a name here, the default will be REPLY.

3. Pattern Matching and Substitution

3.1. Length

We can access the length of a string using the hash (#) operator inside parameter expansion before the variable name:

$ NAME=Baeldung
$ echo ${#NAME}
8

3.2. Substrings

We can extract a substring using the colon (:) operator inside the parameter expansion, providing the starting position of substring and optionally length of the substring:

$ NAME=Baeldung
$ echo ${NAME:6}
ng
$ echo ${NAME:0:4}
Bael

3.2. Pattern Matching

Bash has a built-in simple pattern matching system. It consists of a few wildcards:

* – matches any number of characters
+ – matches one or more characters
[abc] – matches only given characters

For example, we can check if the file has a .jpg extension using a conditional statement:

$ if [[ "file.jpg" = *.jpg ]]; then echo "is jpg"; fi
is jpg

There’s also an extended matching system called “extended globbing”. It enables us to constraint wildcards to specific patterns:

*(pattern) – matches any number of occurrence of pattern
?(pattern) – matches zero or one occurrence of pattern
+(pattern) – matches one or more occurrence of pattern
!(pattern) – negates the pattern, matches anything that doesn’t match the pattern

Extended globbing must be turned on with the shopt command. We can improve the last snippet to also match the .jpeg extension:

$ shopt -s extglob
$ if [[ "file.jpg" = *.jp?(e)g ]]; then echo "is jpg"; fi
is jpg

If we need more expressive pattern language we can also use regular expressions with the not-equals (=~) operator:

$ if [[ "file.jpg" =~ .*\.jpe?g ]]; then echo "is jpg"; fi
is jpg

We can use Extended Regular Expressions here, like when calling grep with the -E flag. If we use the capture groups, they’ll be stored in the BASH_REMATCH array variable and can be accessed later.

3.3. Removing Matched Substring

Bash provides us with a mechanism to remove a substring from a given string using the parameter expansion. It always removes only one matched substring. Depending on usage, it can match the longest or shortest substring and match starting from the beginning or from the end.

It’s important to note that it doesn’t modify a variable, and only returns a modified value. To make this fact explicit, we’ll use read-only variables in examples.

So, let’s remove an extension from a filename. To do this, we need to match from the end of the string using the percent (%) operator. The singular operator will match the shortest substring, double will match the longest one:

$ declare -r FILENAME="index.component.js"
$ echo ${FILENAME%.*}
index.component

Because we used a singular percent sign, we matched only the .js substring. If we’d like to filter out all the extensions we’d do:

$ declare -r FILENAME="index.component.js"
$ echo ${FILENAME%%.*}
index

We can also remove filename, leaving only extensions. In that case, we need to start from the beginning using the hash (#) operator:

$ declare -r FILENAME="index.component.js"
$ echo ${FILENAME#*.}
component.js

Analogically to the previous example, if we would like to leave only last extension we need to use a double-hash:

$ declare -r FILENAME="index.component.js"
$ echo ${FILENAME##*.}
js

3.4. Substituting Matched Substring

Instead of just removing substring we can substitute it using slash (/) operator. The singular operator changes the first match and the double operator changes all matches. Both match the longest possible substring.

Let’s write code that changes the file name while leaving the extension intact:

$ declare -r FILENAME="index.component.js"
$ echo ${FILENAME/*./index.}
index.js

4. Case Study

Let’s put features described above to good use. We will write a script that updates all old version strings (for example 1.0.1 to 1.1.0) inside a provided file and also keeps backup of old file with old version attached to its name.

We will pass filename and version strings as arguments to the script but we will redeclare them to make the rest of the code more readable. Finally, we will use the redirection mechanism to save modified content.

#!/bin/sh
declare -r FILENAME=$1
declare -r OLD_VERSION=$2
declare -r NEW_VERSION=$3
declare -r BACKUP_FILENAME=${FILENAME%.*}'_'$OLD_VERSION'.'${FILENAME##*.}
declare -r CONTENT=`cat $FILENAME`

cp $FILENAME $BACKUP_FILENAME
echo "${CONTENT//$OLD_VERSION/$NEW_VERSION}" > $FILENAME

5. Summary

In this tutorial, we learned how to manipulate strings in pure Bash, without the help of external tools, from declaration to substitution.

Administration

Scripting

Networking

Files

Processes

Full Archive

About Baeldung