Sometimes, when we write shell scripts, we have to deal with special characters like spaces, symbols, and other non-ASCII characters. These characters might not be able to be processed by shell scripts and other tools directly. Therefore, we’ll have to take some measures in order to process these special characters.
In this tutorial, we’ll go through the most common use-cases regarding handling special characters in shell scripts. First of all, we’ll discuss wrapping command and variable substitutions in shell scripts.
Finally, we’ll see the Shellcheck utility in action and how we can use it to make sure our scripts are free of any caveats.
2. Wrapping Substitutions in Double Quotes
In a shell, when we specify filenames to a command like mv, the shell treats the whitespaces between the filenames as a delimiter. So, each filename will correspond to a separate file or directory on the disk.
But what happens when we have a filename that contains spaces? Well, the shell will treat the filename as a list of files.
We can demonstrate this in the terminal by trying to process a filename with spaces:
$ mv file with spaces /tmp mv: cannot stat 'file': No such file or directory mv: cannot stat 'with': No such file or directory mv: cannot stat 'spaces': No such file or directory
This happens because the shell thinks it’s a list of files separated by spaces. To overcome this, we’ll need to surround the filename with double quotes:
$ mv "file with spaces" /tmp
Now, the shell will treat this filename as a whole entity.
2.1. Variable Substitution Inside Double Quotes
This is somewhat the same for variables inside the shell as well. Let’s say we have a variable $HOME. Surrounding this variable with double quotes can mean three things:
- Take the value of the HOME variable as a whole
- Split the string into fields using whitespaces as the delimiter
- Treat each whitespace-separated field as a glob that can be expanded by the shell
In our case, we’re interested in the string context – the double quotes around the variable yields a single string. Therefore, any amount of whitespaces and other special characters (?, [, \) inside the string will be a part of the string:
#!/bin/sh doc="Reference Manual.pdf" doc_path="$XDG_DOCUMENTS_DIR/$doc" echo "$doc_path"
$ sh script.sh /home/user/Documents/Reference Manual.pdf
On the other hand, the other two use-cases will yield the output in a list context – each word in the list is a field separated by whitespace.
For instance, if we process the positional arguments with “[email protected]”, it will yield the arguments as a list, @0, @1, @2, and so on, up to @#:
#!/bin/sh # Count lines in each file for f in "[email protected]"; do echo $(wc -l "$f") done
$ sh script.sh /etc/fstab /etc/hostname 13 /etc/fstab 1 /etc/hostname
2.2. Command Substitution Inside Double Quotes
The same concept holds for command substitution as well. Usually, we substitute commands inside the $() notation or backticks. However, we should know that using backticks for command substitution is not the POSIX way, and some shells might complain about it:
#!/bin/sh # Prefer this result="$(lsblk | grep sda)" # Not this result="`lsblk | grep sda`"
In the example above, the output of the command will yield a string because we’re using the double quotes in the string context. The format of the output will be preserved, including the newlines.
However, if we omit the quotes, the format will not be preserved because the shell will yield the result in the list context:
$ echo "$(lsblk | grep sda)" sda 8:0 0 119.2G 0 disk |-sda1 8:1 0 128M 0 part /boot/efi |-sda2 8:2 0 8G 0 part [SWAP] `-sda3 8:3 0 111.1G 0 part /
$ echo $(lsblk | grep sda) sda 8:0 0 119.2G 0 disk |-sda1 8:1 0 128M 0 part /boot/efi |-sda2 8:2 0 8G 0 part [SWAP] `-sda3 8:3 0 111.1G 0 part /
In this output, the resulting string is actually a list of fields that are separated by whitespace.
3. Handling Filenames with “-” and “+” Prefixes
Filenames can contain a leading dash (-) or a plus sign (+). As we know, the dash (-) prefix in the command line denotes an option for most commands. Therefore, our script will produce an error when processing these filenames.
Fortunately, we can resolve this issue by using a double-dash (–) before the filenames that contain the dash or plus prefix. It indicates the end of options to the command so that the succeeding arguments will be treated as filenames:
#!/bin/sh wc -l -- "[email protected]"
$ sh script.sh -- -text text_file 2 -text 1 text_file 3 total
In the above script, we specified the leading double dashes before [email protected], so each filename with a leading dash will be used as it is. In this case, it recognizes the “-text” file. Additionally, it will not affect the other filenames that don’t contain a leading dash or plus sign.
3.1. Handling Filenames Named “-“
It’s possible that we might come across files whose filenames consist of just a single dash. However, certain commands will treat this as standard input or standard output. In those scenarios, we can use the redirect operators (<, >) for files with the name “-“:
$ echo "Hello, World!" > -
$ cat < - Hello, World!
4. read and IFS
4.1. read Without Options
The read command reads input from a variable, file, or standard input. When we use the read command in a shell script without any options, it will carry out some operations on the special characters like whitespaces, backslashes, and continuation lines.
For example, let’s write a simple command in the terminal that reads a string and then prints its lines:
#!/bin/sh kiss=' Keep \ It Simple\Stupid' printf "%s\n" "$kiss" | while read line; do printf "%s\n" "$line" done;
$ sh script.sh Keep It SimpleStupid
In the kiss variable, we have a continuation line, leading double spaces, and a backslash in the second line. However, when we feed this string to the read command, it’ll remove those backslashes that occur alongside newlines and leading spaces.
4.2. The -r Option
What if we want to override this default behavior of read and retain the backslashes? Well, in that case, we’ll need to use the -r option:
... printf "%s\n" "$kiss" | while read -r line; do printf "%s\n" "->$line" ...
$ sh test.sh ->Keep \ ->It Simple\Stupid
Now, the text is printed in two lines, just as we want. The backslashes are also retained.
4.3. The IFS Environment Variable
One thing missing in the above output is the leading double spaces. The read command will eat up the leading spaces, and there’s no suitable option for us to specify.
Therefore, we’ll need to nullify (empty) the IFS (Internal Field Separator) environment variable. The IFS variable, by default, contains the separators or delimiters that we can use to split a string.
By emptying the IFS variable, we can read the lines literally as-is because there will be no delimiter to use for the splitting of the string:
... printf "%s\n" "$kiss" | while IFS= read -r line; do ...
$ sh script.sh -> Keep \ -> It Simple\Stupid
5. Escaping Special Characters with Backslash
In a shell, the most common way to escape special characters is to use a backslash before the characters. These special characters include characters like ?, +, $, !, and [.
Let’s experiment with printing these characters in the terminal:
$ echo \ >
When we echo a single backslash, the shell thinks of it as a continuation line. So, in order to print the backslash, we’d need to add another backslash:
$ echo \\ \
The $ character is a prefix for reading from a shell variable:
$ echo $0 /usr/bin/zsh
$ echo $$ 2609
$ echo \$0 $0
$ echo \$$ $$
The other characters like ?, !, and $ have special meaning in the shell as well. Therefore, let’s keep in mind that whenever we come across these characters in a string, we’ll need to add a backslash before them to get the literal character.
6. Writing Robust Scripts with Shellcheck
Shellcheck is a simple utility that we run against our shell scripts to carry out an analysis. Shellcheck will check for errors, warnings, and potential security holes in our scripts. It supports a variety of shells like dash, bash, and ksh.
Shellcheck, by default, doesn’t ship with the major distributions. But, no worries because it’s available in most official package repositories.
We can use a package manager like yum or apt to install the shellcheck package. After the installation, let’s verify it:
$ shellcheck --version ShellCheck - shell script analysis tool version: 0.8.0
We’ll write a simple shell script that prints our IP address to the screen from a variable:
#!/bin/sh greeting="Hello! ip_addr=$(curl -s icanhazip.com 2> /dev/null) echo "$greeting Your IP is $ip_addr"
Now, let’s run shellcheck against this script:
$ shellcheck script.sh In test.sh line 3: greeting="Hello! ^-- SC1009 (info): The mentioned syntax error was in this simple command. ^-- SC1078 (warning): Did you forget to close this double quoted string? In test.sh line 6: echo "$greeting. Your IP is $ip_addr" ^-- SC1079 (info): This is actually an end quote, but due to next char it looks suspect. ^-- SC1073 (error): Couldn't parse this double quoted string. Fix to allow more checks.
After running shellcheck, we can see that it prints a lot of useful information. In this case, we left the ending quote for the greeting variable. In Line 6, we’re starting a double quote, but the tool points out that it might be the ending quote for “Hello.
Let’s fix these errors and run shellcheck again:
... greeting="Hello!" ip_addr=$(curl -s icanhazip.com 2> /dev/null) echo "$greeting. Your IP is $ip_addr" ...
$ shellcheck script.sh $
Since we’ve fixed the error, we don’t have any warnings.
Sometimes, shellcheck will detect very subtle errors that we might not even notice. Therefore, if we write a lot of scripts, shellcheck should be in our toolbox because it enforces us to use best practices, eventually making us better at writing shell scripts.
In this article, we discussed how we could handle special characters and whitespaces in the shell. We wrote various small shell scripts to demonstrate different methods for different use cases.
Finally, we covered the shellscheck static analysis tool and how it can help us become better shell script developers.