1. Overview
In this tutorial, we’ll learn what URL encoding is. Then, we’ll go through a few different methods for decoding encoded URLs in Linux.
1.1. Understanding URL Encoding
URL encoding and decoding are standardized through rfc3986. In short, some characters in a URL are represented using percent-encoding – a percent sign followed by two hexadecimal digits.
Percent encoding is used to encode reserved characters:
Type | Key |
---|---|
Reserved Characters | ! * ‘ ( ) [ ] ; : @ & = + $ , / ? % # { } < > |
Unreserved Characters | a-z A-Z 0-9 – _ . ~ |
Reserved characters have a special purpose and should be encoded when in a URL. All other characters are unreserved characters, but can be represented using percent-encoding as well. Lastly, spaces are represented as plus signs:
Encoded: example.org/5+Percent+Codes+%21%2A%27%2B
Decoded: example.org/5 Percent Codes !*'(+
As we can see in this example, we have an encoded string containing five percent codes and three pluses. The pluses get turned into spaces and the percent codes get decoded to ASCII.
2. Decoding URLs
To decode a URL, we have to first replace any plus signs with spaces. Then, we remove the percent signs and convert the following two hexadecimal digits to ASCII.
2.1. Using the Shell
Let’s begin with a simple bash solution without the use of outside programs:
$ (IFS="+"; read _z; echo -e ${_z//%/\\x}"") <<< 'example.org/end+sentence+.%3F%21'
example.org/end sentence .?!
The first part of this works through word splitting on the plus sign by use of IFS. To ensure expansion occurs, we don’t quote the variable. However, we do put empty quotes after the variable so that a plus sign at the end of a URL doesn’t get cut off. We use a subshell so that IFS isn’t changed globally.
To take in input as a variable, we can use read. This allows us to use parameter expansion on this variable to replace all occurrences of percent signs with \x. Then when we use echo -e to interpret these escapes.
It’s a bit more difficult and less efficient, but we can make a portable, POSIX-compliant shell script to accomplish this as well:
#!/bin/sh
posix_compliant() {
strg="${*}"
printf '%s' "${strg%%[%+]*}"
j="${strg#"${strg%%[%+]*}"}"
strg="${j#?}"
case "${j}" in "%"* )
printf '%b' "\\0$(printf '%o' "0x${strg%"${strg#??}"}")"
strg="${strg#??}"
;; "+"* ) printf ' '
;; * ) return
esac
if [ -n "${strg}" ] ; then posix_compliant "${strg}"; fi
}
posix_compliant "${*}"
Here we use recursion along with POSIX-supported parameter expansion to decode our string. We convert the hexadecimal characters to octal first to avoid hex conversion (which is unsupported by POSIX printf ).
After creating our shell script we can run it from the command line:
$ chmod +x decode.sh
$ /path/of/script/decode.sh 'example.com/a%26b%40c'
example.org/a&b@c
We make our script executable using chmod and then execute it by using its full path. This script will work in almost any shell without an outside program.
2.2. Using perl and python
Creating a solution using perl is as simple as with a shell:
$ perl -pe 's/\+/\ /g;' -e 's/%(..)/chr(hex($1))/eg;' <<< 'example.org/%3C%2Fend%3E'
example.org/</end>
Here we use perl’s substitution operator to replace the plus signs in our string with spaces. Afterward, we substitute percent signs and the following two-digit hexadecimal to ASCII. We use perl‘s e modifier to evaluate the expression chr(hex($1)), using hex to convert to decimal and then chr to convert to ASCII.
Finally, let’s create a solution using python:
$ python -c 'print(input().replace("+", " ").replace("%", "\\x").encode().decode("unicode_escape"))' <<< 'example.org/%7B1%2C2%7D'
example.org/{1,2}
This works the same way as the previous example except we wait to convert to ASCII until the very end. We replace the percent signs with \\x and then convert our string into bytes using str.encode() method so that we can use bytes.decode(), unescaping \\x into the \x operator.
2.3. Defining Aliases
To allow for easy URL conversion on the command line we can define an alias in our ~/bashrc:
alias decode_url='perl -pe '\''s/\+/ /g;'\'' -e '\''s/%(..)/chr(hex($1))/eg;'\'' <<< '
We convert our perl code to an alias by surrounding it with single quotes. We keep our single quotes surrounding our perl code by inserting \’ in their place.
Now let’s test out our alias:
$ decode_url 'example.org/E+%3D+mc%5E2'
example.org/E = mc^2
After defining our alias we can call it from the terminal and it will decode the URL we pass to it.
3. Conclusion
In this article, we learned what URL encoding is and what purpose it serves. Then we discussed a few ways to decode an encoded URL.