1. Introduction

Source code, command lines, and most computer interaction at its most basic level consist of characters. On the other hand, most characters are not represented by keys on a regular keyboard, many are not printable at all, and yet another group are complex control characters.

In this tutorial, we’ll discuss character escaping in Bash. First, we briefly describe how machines represent characters. After that, we explore types of strings in Bash. Next, the character escaping in pure Bash is discussed in detail. Finally, we look at specific cases where escaping is involved.

We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It is POSIX-compliant and should work in any such environment.

2. Characters

Usually, apart from a pointer, users have one other method to enter data – text. To represent text, machines use a succession of bytes. They encode characters based on a predefined code table, usually ASCII or Unicode.

Types of characters roughly include:

Since we’re dealing a lot with characters, in this article, we use the angle bracket notation to represent them with names from these tables.

Importantly, we must have a way to write out any character we need, be it ASCII, Unicode, or a custom encoding. Unfortunately, we have a minimal set of keyboard keys to represent many different text symbols.

3. Bash Strings

Writing and storing characters are two separate actions because no keyboard has keys for every possible symbol.

In Bash, the text is stored as strings. In fact, all Bash variables are just strings of characters. They are usually direct, single-quoted, or double-quoted sequences.

Importantly, the difference between these methods is that we interpret or interpolate certain combinations of characters in one context and take them literally in another.

3.1. Single Quotes

Within single quotes, we don’t interpolate anything:

$ text='a $(echo b) c'
$ echo "${text}"
a $(echo b) c

Note how all text within the single quotes is preserved. No interpolation is done, but this means we also can’t, under any circumstances, have a single quote directly within the single quote.

3.2. Double Quotes

When using double quotes, we preserve the literal value of most characters:

$ text="a"
$ text="${text} $(echo "b") c"
$ echo "${text}"
a b c

First, we assign <a> to text directly, with just the [a] key. After that, we acquire its value via variable expansion with ${text}. Here, the $ <dollar-sign> combination gets interpreted. Finally, we concatenate this value with an expression and yet another character, assigning back to text.

3.3. No Quotes

As long as the string adheres to certain rules, we can skip the quotes:

$ text=a$(echo b)c
$ echo ${text}

We discuss some of the rules in the next section.

3.4. Special Quoting

Double-quoting text with a <dollar-sign> prefix causes a string to be translated based on the current locale. Thus, the string’s final translation is double-quoted. Importantly, this article does not deal with locales and assumes C as the default one for all examples.

Single-quoting text with the same prefix is treated differently. In this case, escaped characters are replaced.

In the next section, we’ll clarify what escaping means.

4. Bash Character Escaping

Except within single quotes, characters with special meanings in Bash have to be escaped to preserve their literal values. In practice, this is mainly done with the escape character \ <backslash>. In some cases, we may have to employ other methods.

Let’s see when and how we use which method.

4.1. Double Quotes

We escape text inside double-quoted strings by prefixing a character with <backslash>:

$ text1="a $(echo b) c"
$ text2="a \$(echo b) c"
$ echo "${text1}"
a b c
$ echo "${text2}"
a $(echo b) c

Note how, in the case of text2, the <dollar-sign> is escaped, losing its special functions and preserving its literal meaning.

These are all special characters, which may have to be escaped to preserve their literal meaning within double quotes:

  • $ <dollar-sign>, e.g. $() and ${}
  • ` <grave-accent>, also known as the backquote operator
  • ” <quotation-mark>, when we need a double quote within double quotes
  • newline <newline>, which is equivalent to <LF> under Linux
  • \ <backslash>, when prefixing a character in this list except <exclamation-mark>
  • ! <exclamation-mark>, when history expansion is enabled outside POSIX mode, usually the case
  • ~ <tilde>, when beginning a string, to avoid tilde expansion and confusion with the $HOME directory

Furthermore, the <backslash> prefix is not stored in the string when preceding all but one (<exclamation-mark>) of the characters above:

$ text="!event"
bash: !event: event not found
$ text="\a \$ \` \!event \\"
$ echo ${text}
\a $ ` \!event \

Importantly, the <exclamation-mark> is an exceptional character, the special meaning of which can be ignored by:

  • prefixing it with a backslash (which remains, same as with a normal character like <a>)
  • using it at the end of a string or before whitespace characters
  • enclosing it in single quotes to escape an <exclamation-mark>
  • disabling history expansion via set +o histexpand
  • being in POSIX mode

Finally, the combination <backslash><newline> is ignored and removed from double-quoted strings. This simply means that we can spread a string over several lines without adding newline characters to it:

$ text="a \
> b"
$ echo "${text}"
a b

Let’s now explore how Bash treats sequences without any quotes.

4.2. No Quotes

As we already showed, we can forgo the quotes altogether, but there is a price.

Namely, any sequence without quotes wouldn’t be unified without escaping all characters, which are not alphanumeric or part of the following group: <comma>, <period>, <underscore>, <plus-sign>, <colon>, <commercial-at>, <percent-sign>, <slash>, <hyphen>:

$ text=a\ \&\ b\ \&\ c
$ echo "${text}"
a & b & c

It’s rarely, if ever, preferable to not use quotes.

4.3. ANSI-C Combinations

When using $’STRING_TEXT’, the sequence within the single quotes expands to a string, with escaped characters replaced according to the ANSI-C quoting:

$ echo $'\u0061'

The \u escape sequence interprets the four digits directly following it as a hexadecimal code in the Unicode ISO/IEC 10646 table.

Importantly, where they are recognized, we can use the \u, \U, \x, and similar sequences to place any character without further escaping. Note that, in this case, the escape turns special meanings of characters on, not off. These are two ways to avoid the “shortage of keys” on a keyboard.

Moreover, many other tools use the ANSI-C standard.

5. Special Cases

Bash is a shell that has built-in commands and capabilities. Many use the ANSI standards, but some functionalities also use their own special control characters within strings.

Keep in mind that any string, which we pass through Bash, first gets interpreted by Bash. This means all rules from the previous section apply, but we may build on top of them in this one.

5.1. Bash Prompt

The first thing we see when using Bash is prompt. It normally shows some useful information about the machine, user, current directory, etc. All of these are stored as defaults in variables P0, P1, P2, and P4.

However, we can modify these variables. Furthermore, we can use terminal control characters to customize our prompt:

$ echo "Current prompt: ${PS1}"
Current prompt: $
$ PS1='\t> '
00:00:10> echo "Current prompt: ${PS1}"
Current prompt: \t>

These sequences start with <backslash> like most we already looked at. In addition, there are the \[ and \] escape sequences, which add another layer of encoding.

5.2. ANSI Escape Sequences

Within many terminals, we can also use other escape codes like the standard ANSI escape sequences. For example, there are ways to change the color of terminal text, cursor location, fonts, and other options. These sequences start with <ESC>, hence the name:

$ PS1="TESTING\033[1K> "
> echo "Current prompt: ${PS1}"
Current prompt: TESTING\033[1K>

In this example, the so-called control sequence introducer <ESC><left-square-bracket> starts the K command with an argument of 1. Thus, it clears all characters from the beginning of the current line. Because of this, the word TESTING does not show up in the prompt.

As already mentioned, ANSI and ANSI-C escape sequences are used throughout the Linux ecosystem. For example, both echo and printf recognize them. We need the -e parameter to echo, but printf works with ANSI by default.

5.3. printf

The standard built-in printf (Print Function) command also has its own special character.

Recall our discussion of writing strings without quotes. The characters we would need to escape in that instance are in the output of the following script:

$ for code in {0..127}; do
>   printf -v chr '\\%o' "${code}"
>   printf -v chr "${chr}"
>   printf -v echr "%q" "${chr}"
>   if [[ "${chr}" != "${echr}" ]]; then
>     printf "%02X %-7s\n" "${code}" "${echr}"
>   fi
> done
00 ''
01 $'\001'
07 $'\a'
08 $'\b'
09 $'\t'
0A $'\n'

The snippet above goes through the first 128 characters in the ASCII table. For each, it uses printf to extract and compare each character with its escaped form.

First, %o returns the octal form of the character’s code. Next, this value is reused in printf with a <backslash> prefix to get the resulting character. After that, with the help of the %q format modifier, we get an escaped version of the character. Finally, we compare the normal and escaped version to determine whether we need to escape this character and output the result if we do.

Note the % <percent-sign> character within the printf argument. We can ignore its special meaning by escaping it with another <percent-sign>: %%. This preserves the literal value.

5.4. Parameter Transformation

As of version 4.4, Bash supports parameter transformation. This functionality allows us to perform many of the operations that printf and other built-ins have, but directly within Bash.

For example, we can use echo ${VAR@Q} as a replacement for printf with %q:

$ text='\'
$ printf '%q\n' "${text}"
$ echo "${text@Q}"

As we already learned, \\ and ‘\’ are equivalent.

Both of the approaches above are very useful when it comes to multilevel escaping:

$ text="6*6*6 equals 216"
$ text="$(printf '%q' "${text}")"
$ text="$(printf '%q' "${text}")"
$ echo "${text}"
6\\\*6\\\*6\\\ equals\\\ 216

Indeed, without a way to perform this operation automatically, manual escaping of long lines often leads to many errors.

5.5. Command-Line Arguments

Many standard Bash built-ins use the <hyphen> argument name prefix. In addition, they often have the argument, after which the special interpretation of <hyphen> following whitespace stops. Without using , we have no means to escape the character to prevent this behavior.

6. Summary

In this tutorial, we discussed character escaping in Bash. We first learned that characters have different encoding tables. In addition, we saw that some are characters are not printable but are only a marker or command text. To use such characters literally, we need the means to escape them. We explored pure Bash, as well as some common Bash built-in character escaping cases.

In conclusion, character escaping is only partially standardized, so many obscure scenarios and tools exist where escaping a character is non-trivial.

Comments are closed on this article!