1. Introduction

The ubiquitous curl client provides a convenient way to send requests through a Universal Resource Identifier (URI) via different protocols from the terminal. However, due to the shell interpretation of some characters, as well as ways to escape them, applications might have to encode parts of the URI so they aren’t interpreted before submission.

In this tutorial, we’ll talk about URI encoding and ways to submit a complex curl request without issues. First, we go through a basic curl request. After that, we explore perhaps the most common method for encoding URI components. Next, we combine the two concepts to show an example. Finally, we look at an often easier and better way to automatically apply Universal Resource Locator (URL) encoding with curl.

To underline the universality of the client, the terms URI and URL are used interchangeably.

We tested the code in this tutorial on Debian 12 (Bookworm) with GNU Bash 5.2.15. It should work in most POSIX-compliant environments unless otherwise specified.

2. Basic curl Request

To begin with, let’s make a basic curl HTTP (HyperText Transfer Protocol) query and filter the output to only get the request and response lines without headers:

$ curl --verbose 'http://gerganov.com/script?arg1=value1&arg2=value2' 2>&1 | grep HTTP
> GET /script?arg1=value1&arg2=value2 HTTP/1.1
< HTTP/1.1 200 OK

In particular, we employ grep to only show lines that contain HTTP from the –verbose (-v) stderr (2) output of curl as > redirected to &1, i.e., stdout. Consequently, we see two lines:

  • > outbound request
  • < inbound response

In this case, the request is readable and well-formed since it begins with the type (GET), continues with the URL surrounded by whitespace, and ends with the protocol version (HTTP/1.1). The query string contains several parts:

  • http://: protocol (HTTP)
  • gerganov.com: domain name
  • /script: path, one script file in this case
  • ?arg1=value1&arg2=value2: query parameters, prefixed by the ? question mark separator

Indeed, the response is 200 OK, meaning the web server also recognized the query as properly formed. However, the latter sometimes necessitates encoding.

3. Percent Encoding

One of the most common ways to encode a URL is percent-encoding. Basically, this method uses the % percent sign followed by escape sequence codes to represent certain characters:

+-------------------------------+
| Char | PEChar | Char | PEChar |
|------+--------+------+--------|
| :    | %3A    | &    | %26    |
| /    | %2F    | '    | %27    |
| ?    | %3F    | (    | %28    |
| #    | %23    | )    | %29    |
| [    | %5B    | *    | %2A    |
| ]    | %5D    | +    | %2B    |
| @    | %40    | ,    | %2C    |
| !    | %21    | ;    | %3B    |
| $    | %24    | =    | %3D    |
| %    | %25    | ' '  | %20    |
+-------------------------------+

While we can encode any character, often only some are problematic when used in the terminal, within protocols or formats. Also, there are many ways to manually correct a problematic request URL by applying percent-encoding.

3.1. Using jq

For example, let’s leverage the jq command for our purposes:

$ printf 'http://gerganov.com/script?arg1=value1&arg2=value2' | jq --slurp --raw-input --raw-output @uri
http%3A%2F%2Fgerganov.com%2Fscript%3Farg1%3Dvalue1%26arg2%3Dvalue2

First, we output the URL via printf to a pipe that passes it to jq. With the help of the –raw-input (-R) and –raw-output (-r) flags, we process and output only regular text without converting to JavaScript Object Notation (JSON) at any point. This enables us to –slurp (-s) all of stdin as a single string and apply transformations to it without side effects.

The final output encodes all characters within the URI, which is often an issue for clients:

$ curl 'http%3A%2F%2Fgerganov.com%2Fscript%3Farg1%3Dvalue1%26arg2%3Dvalue2'
curl: (6) Could not resolve host: http%3A%2F%2Fgerganov.com%2Fscript%3Farg1%3Dvalue1%26arg2%3Dvalue2

In this case, since it can’t even establish the protocol, curl assumes the string is a hostname but can’t resolve it.

So, let’s create a more refined solution.

3.2. Using perl

We can also leverage perl for a solution similar to that with jq:

$ perl -MURI::Escape -e '$ARGV[0]; print uri_escape($ARGV[0]);' 'http://gerganov.com/script?arg1=value1&arg2=value2'
http%3A%2F%2Fgerganov.com%2Fscript%3Farg1%3Dvalue1%26arg2%3Dvalue2

We employ the cpan [-M]odule URI::Escape

Notably, again, even the parts before the actual argument list are encoded.

Since the rest is important for the URI structure, let’s only encode the arguments:

$ perl -MURI::Escape -e '
  @p=split(/\?/,$ARGV[0],2);              # get non-argument and argument parts
  if (scalar(@p) > 1) {                   # if there are any arguments
    print($p[0]."?");                     #  print the first part of the URI
    @a=split(/&/,$p[1]);                  #  get arguments in an array
    @kv=split(/=/,$a[0],2);               #  split the first argument assignment
    print(uri_escape(@kv[0])."=".         #  print encoded the first argument
          uri_escape(@kv[1]));            #   and value
    if (scalar(@a) > 1) {                 #  if there are more arguments
      shift(@a);                          #   remove the first argument from the array
      foreach (@a) {                      #   for each remaining argument
        @kv=split(/=/,$_,2);              #    split the argument assignment
        print("&".uri_escape(@kv[0])."=". #    print & and encoded argument
                  uri_escape(@kv[1]));    #     and value
      }
    }
  } else {                                # if there are no arguments
    print($ARGV[0]);                      #  print the URI as is
  }
' <URI>

This Perl script takes a URI and separates out the query parameter (argument) list, if any, working on each argument separately, so the non-argument part, as well as the & and = separators, are preserved.

Yet, why do we need to encode parts of a URI at all?

4. Manually Encoding curl Requests

After getting to know percent-encoding, let’s explore an example with curl.

We begin by introducing some problematic syntax within a basic query. After that, we encode the URL, so the other side can recognize it.

4.1. Problematic Complex Request

So, let’s introduce some spaces in our original request:

$ curl --verbose 'http://gerganov.com/script?arg1=value1&arg2=value2 and space' 2>&1 | grep HTTP
> GET /script?arg1=value1&arg2=value2 and space HTTP/1.1
< HTTP/1.1 400 Bad Request

Here, we see the extra whitespace within the request. Because of the unexpected spaces in the value of arg2, the server returns a 400 Bad Request response indicating an incorrect syntax. Depending on the server, the same might happen with other characters as well.

4.2. Corrected Complex Request

Since we already have a practical solution, let’s sanitize our request:

$ perl -MURI::Escape -e '
  @p=split(/\?/,$ARGV[0],2);              # get non-argument and argument parts
  if (scalar(@p) > 1) {                   # if there are any arguments
    print($p[0]."?");                     #  print the first part of the URI
    @a=split(/&/,$p[1]);                  #  get arguments in an array
    @kv=split(/=/,$a[0],2);               #  split the first argument assignment
    print(uri_escape(@kv[0])."=".         #  print encoded the first argument
          uri_escape(@kv[1]));            #   and value
    if (scalar(@a) > 1) {                 #  if there are more arguments
      shift(@a);                          #   remove the first argument from the array
      foreach (@a) {                      #   for each remaining argument
        @kv=split(/=/,$_,2);              #    split the argument assignment
        print("&".uri_escape(@kv[0])."=". #    print & and encoded argument
                  uri_escape(@kv[1]));    #     and value
      }
    }
  } else {                                # if there are no arguments
    print($ARGV[0]);                      #  print the URI as is
  }
' 'http://gerganov.com/script?arg1=value1&arg2=value2 and space'
http://gerganov.com/script?arg1=value1&arg2=value2%20and%20space;

Next, we use the newly generated URL with curl:

$ curl --verbose 'http://gerganov.com/script?arg1=value1&arg2=value2%20and%20space' 2>&1 | grep HTTP
> GET /script?arg1=value1&arg2=value2%20and%20space HTTP/1.1
< HTTP/1.1 200 OK

Notably, the spaces in the value of arg2 are now converted to their encoded %20 version, and the result is 200 OK.

5. curl URI Encoding

Due to the prevalence of HTTP and other text-based protocols, as well as the many characters that might break certain systems during communication, terminal clients often provide their own features when it comes to encoding.

In particular, curl encodes URL data with the –data-urlencode option:

$ curl --verbose --get --data-urlencode arg1=value1 'http://gerganov.com/script' | grep HTTP
> GET /script?arg1=value1 HTTP/1.1
< HTTP/1.1 200 OK

Notably, we forced a –get (-G) request in this case for comparability with our earlier examples. In general, we use GET instead of POST for clarity due to the more complex data types of POST.

Now, let’s add two arguments by adding another instance of –data-urlencode, one of them with special characters:

$ curl --verbose --get --data-urlencode 'arg1=value1' --data-urlencode 'arg2=value2 & data?' 'http://gerganov.com/script' 2>&1 | grep HTTP
> GET /script?arg1=value1&arg2=value2%20%26%20data%3F HTTP/1.1
< HTTP/1.1 200 OK

Even though the arg2 second argument contains spaces, a ? question mark, and an & ampersand within its value, curl properly encodes the correct characters to supply a well-formed URL.

In fact, this option is similar to other –data* switches, such as –data-binary for passing binary data in curl.

6. Summary

In this article, we discussed one of the most common URL encoding methods as well as an option curl provides for automating that.

In conclusion, curl is a versatile client that supports most established standards, especially when it comes to security and stability.

Comments are closed on this article!