1. Introduction

By definition, binary data is information encoded in the binary numeric system. So, in essence, everything in computers is binary data. However, from a usage standpoint, binary is produced and consumed differently from other types of data. Still, any data type is essentially processed and interpreted binary information.

In this tutorial, we’ll talk about passing binary data to a command without using a file with the example of the curl utility. First, we discuss the general difference between text and binary data in terms of generating and processing. After that, we delve into the universal way of passing such data to applications. Finally, we cover a particular command and its purpose-built switches for binary data handling.

We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments unless otherwise specified.

2. Handling Text and Binary Data

In general, data and files are often split into binary and text. Knowing the format helps us parse the information.

2.1. Text Data Handling

Due to the ubiquity of ASCII, text data is often called ASCII data. However, we create text by typing or generating symbols in ASCII or other encodings. For example, we can use the echo command:

$ echo 'This is some text.'
This is some text.

Further, we usually process text as such by splitting at whitespaces, word boundaries, and others:

$ echo 'This is some text.' | sed 's/This is \b//g' | awk '{ print toupper(substr($2, 1, 1)) substr($2, 2); }'
Text.

Here, a pipeline of sed and awk first [s]ubsitutes parts of the text [g]lobally, then splits on the default separators, and finally makes the first letter uppercase. All of the tools involved understand and handle strings of text.

2.2. Binary Data Handling

On the other hand, we produce binary data with raw write() system calls or via ANSI-C escape sequences:

$ echo -e '\u666\u660'
٦٠
$ printf '\u666\u660'
٦٠

First, we use the echo command with its -e switch to interpret escape characters. After that, we output the same characters with printf, which handles escaping by default.

While some of the sequence codes might produce readable text, raw binary data is distinguished more by the way we read it. In fact, binary data almost always requires metadata like format specifications, so we can get the proper numeric values and convert them to useful information.

Actually, being numeric in nature, binary data isn’t usually presentable on the screen like the text is, so it’s harder to interpret by hand. For example, compression converts text to binary form, making it hard to manually process, but usually much smaller.

3. Pass Binary Data to Applications

Often, binary data convey more information with fewer bytes than readable text. Because of this, many applications and machine interfaces use binary-encoded strings instead of text.

One basic way to supply data to tools and commands is a pipe:

$ echo 'Text.' | cat
Text.

Here, we have a basic pipeline. In it, echo pipes Text. to cat, which just outputs it to the screen.

The only condition for passing information via pipes is for the receiving end at the right side of the pipe to expect data on stdin like cat does above.

So, combining this concept with the data we generated earlier, we can pass binary to cat:

$ echo -n -e '\u666\u660' | cat
٦٠

Notably, we added the echo -n switch to omit the trailing newline. Naturally, we can simply make a temporary file via process substitution, but that still leverages files:

$ cat <(echo -n -e '\u666\u660')

Thus, we only consider pipes. So, we can also use printf before and other commands after the pipe:

$ printf '\u666\u660' | curl --data @- https://gerganov.com

Notably, prefixing the argument to –data-* with an @ asperand tells curl what follows is a file path. However, when it comes it curl, placing a single dash after the @ asperand means we expect input over stdin.

In this case, we use curl with its –data or -d switch. Still, applications sometimes have specific ways to receive binary data and not process it as text.

4. Binary Data in curl

In fact, curl has designated switches for different data types, including binary.

4.1. Minimal Echo Server

Let’s use the netcat (nc) command to create a minimal server that simply echoes the data it receives:

$ while true; do nc -l 80; done

Here, we run an endless while loop, which spawns an nc instance to [-l]isten on HTTP port 80.

In another terminal, we can send a probe via curl to test our setup:

$ curl http://localhost

This produces output in the previous terminal:

[$ while true; do nc -l 80; done]
GET / HTTP/1.1
Host: localhost
User-Agent: curl
Accept: */*

Thus, we can monitor what our requests produce.

4.2. curl –data* Switches

Now, we can use curl with its –data-binary switch to pass binary data to the command:

$ curl --data-binary 'Data.' https://gerganov.com

Like the –data (-d) switch, the default Content-Type is application/x-www-form-urlencoded. If we want to remove even this processing, we can add Content-Type: application/octet-stream as a [-H]eader:

$ curl --data-binary 'Data.' -H 'Content-Type: application/octet-stream' https://gerganov.com

At this point, we can replace the string with actual binary and send it to our local server:

$ curl --data-binary $'\u666\u660' -H 'Content-Type: application/octet-stream' http://localhost

In this case, we use $” ANSI-C quoting to interpret the combinations we saw earlier. Let’s check the results on the server side:

[$ while true; do nc -l 80; done]
POST / HTTP/1.1
Host: localhost
User-Agent: curl
Accept: */*
Content-Type: application/octet-stream
Content-Length: 4

٦٠

As expected, we get the same characters in the request, so curl received and passed along the binary data correctly.

5. Summary

In this article, we explored ways to supply binary data directly to commands with the primary example of curl.

In conclusion, although we have a fairly universal way of passing binary data to applications without using files, some tools provide their own means to prevent further interpretation as well.

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.