Baeldung Pro – Linux – NPI EA (cat = Baeldung on Linux)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

Partner – Orkes – NPI EA (tag=Kubernetes)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

1. Introduction

We often download large files while using the command line on Linux systems. These files could be SDKs, libraries, other programs, or ordinary files that we wish to download via command line. In any case, knowing the size of the file before proceeding with the download is always helpful. This helps us make sure we have enough space for the download to complete, or decide if we want to proceed with the download.

In this tutorial, let’s look at how to retrieve just the file size before downloading a file, using cURL.

2. Getting the Content-Length Header

When we send an HTTP request to download the file, the corresponding response includes a “Content-Length” header. This header mentions the number of bytes in response. To do this, we’ll use the curl command with the -I and -L arguments, followed by the URL. Let’s see the command and its output below:

$ curl -I -L https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-standard-3.20.2-x86_64.iso
HTTP/2 200 
content-security-policy: script-src 'self'
content-type: application/octet-stream
etag: "669e6f13-d100000"
last-modified: Mon, 22 Jul 2024 14:39:15 GMT
referrer-policy: origin-when-cross-origin
server: nginx/1.25.5
strict-transport-security: max-age=63072000; includeSubDomains; preload
x-content-type-options: nosniff
x-frame-options: DENY
via: 1.1 varnish, 1.1 varnish
accept-ranges: bytes
age: 758032
date: Thu, 29 Aug 2024 03:01:46 GMT
x-served-by: cache-ams21054-AMS, cache-hyd1100022-HYD
x-cache: HIT, HIT
x-cache-hits: 335, 0
x-timer: S1724900506.418957,VS0,VE1
vary: Origin
content-length: 219152384

In the above command, we tried to get the headers for downloading the Alpine Linux ISO file from the URL we specified. We can see that the “content-length” header has a value of 219152384, in the last line of the output. This is the number of bytes that will be downloaded if we were to download the file. We used the -I option to retrieve only the headers, and the -L option to follow redirects.

We must also note that there could be scenarios where the “Content-Length” header is missing. In such cases, the only way to know the size of the file is to go ahead with the download.

3. Refining the Output

The above output contains a lot of information we don’t need. As it is, it’s suitable for manual inspection. However, we need to filter it further if we want other commands or scripts to use the output. Let’s see how to do this.

3.1. Filtering the Number of Bytes

We can refine the above output to print just the number of bytes using gawk. Let’s see how to do this:

$ curl -s -I -L https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-standard-3.20.2-x86_64.iso | gawk -v IGNORECASE=1 '/^Content-Length/ {print $2}'
219152384

In the above command, we added a -s argument to our original curl command. This will suppress printing the progress, which cURL does by default otherwise. Next, we piped the output into a gawk command, with a -v argument to specify the variable IGNORECASE=1. This will ensure that the case is ignored while locating the Content-Length header, as HTTP headers are case-insensitive.

Finally, we specify the gawk command in single quotes (‘/^content-length/ {print $2}’), which instructs it to match the line starting with ‘content-length’, and print the second space-separated string in the line, which gives us just the number of bytes in the output.

3.2. Printing in Human-Readable Format

To further refine the output, we can convert the number of bytes to a human-readable format. We can do this by piping the above output into numfmt. Let’s see how to do this:

$ curl -s -I -L https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-standard-3.20.2-x86_64.iso | gawk -v IGNORECASE=1 '/^Content-Length/ {sub("\r", "", $2); print $2}' | numfmt --to=iec
209M

In the above command, we piped the number of bytes into the numfmt command to be formatted as per IEC. We also added one more statement to our gawk command, to remove the \r symbol that is present in the cURL output. Unless removed, this will cause the numfmt command to throw an error. After these additions, we can now see a neat human-readable size of 209M.

4. Conclusion

In this article, we looked at how to get the size of a downloadable file using cURL. We also further refined the output to print just the number of bytes or in a human-readable format. We can also use this output in our Bash scripts.

Finally, we must also note that the “Content-Length” header is not always present, and be prepared to handle scenarios where it’s missing.