1. Overview

cURL is a de facto standard utility in Linux to connect and transfer data from URLs.

In this tutorial, we’ll learn how to connect to a site using cURL and retrieve the HTTP response status.

2. Using –head Option

When working with the HTTP or HTTPS protocol, we get the HTTP response status as part of the response header. So, our natural choice to retrieve the status code is to explore the –head option to get the status code.

Let’s use the curl command with the –head option to connect to example.com and analyze the output format:

$ curl --head https://www.example.com
HTTP/2 200
content-encoding: gzip
accept-ranges: bytes
age: 594470
cache-control: max-age=604800
content-type: text/html; charset=UTF-8
date: Fri, 17 Feb 2023 01:09:59 GMT
etag: "3147526947+gzip"
expires: Fri, 24 Feb 2023 01:09:59 GMT
last-modified: Thu, 17 Oct 2019 07:18:26 GMT
server: ECS (nyb/1D17)
x-cache: HIT
content-length: 648

We can notice that the first line contains the HTTP version and status code.

Next, let’s pipe this with a one-liner awk command for getting the status code:

$ curl --head https://www.example.com | awk '/^HTTP/{print $2}'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0   648    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0

Although we’ve retrieved the status code successfully, curl with pipe resulted in a progress meter. So, we can’t save this output in a variable for further use.

Next, let’s remove the progress meter by using the –silent option available with the curl command:

$ curl --silent --head https://www.example.com | awk '/^HTTP/{print $2}'

Perfect! The result looks correct now.

3. Using –include Option

We limit the output to the response headers when we use curl with the –head option. So, we’ll have to reconnect to the site if we need the response body for further operation.

For such scenarios, we can avoid the –head option and instead use curl with the –include option to capture both response headers and body with a single connection request.

Let’s start by connecting to the site and saving the response header and body in the resp_header_body variable:

$ resp_header_body="$(curl --silent --include https://www.example.com)"

Further, let’s also do a sanity check that resp_header_body does indeed have both values:

$ echo $resp_header_body | awk 'NR==1{print $0} END {print $0}'
HTTP/2 200

We must note that the response body could contain a lot of data, so checking the first and last lines of the content with the awk command is sufficient.

Next, let’s find the line of separation between the header and body values which is essentially the first empty line from the output:

$ line_break=$(echo "$resp_header_body" | awk -v RS='\r\n' '/^$/{print NR; exit;}')
$ echo $line_break

Moving on, let’s pass the line_break variable as a parameter to awk and store the HTTP response header and body in the resp_header and resp_body variables, respectively:

$ resp_header="$(echo "$response" | awk -v LINE_BREAK=$line_break 'NR<LINE_BREAK{print $0}')"
$ resp_body="$(echo "$response" | awk -v LINE_BREAK=$line_break 'NR>LINE_BREAK{print $0}')"

Finally, we can retrieve the HTTP status from the resp_header variable:

$ echo $resp_header | awk '/^HTTP/{print $2}'

4. Using –write-out Option

The curl command supports the –write-out option to specify a custom format for the output using a combination of format strings. Further, we can enclose the predefined variables such as http_code within %{} to specify the output format.

Let’s see this in action by connecting to the site using curl with the –write-out option:

$ curl --silent --output /dev/null --write-out "%{http_code}" https://www.example.com

We must note that the –write-out option specifies the format for any additional content we want to see in the output. Since we aren’t interested in seeing the response body, we redirected the content to /dev/null using the –output option.

5. Handling Redirection

Sometimes the original site could respond with the HTTP 3xx status code to redirect the users to a different site. In such scenarios, we’re interested in getting the status of the final site the user will visit.

Let’s try connecting to the mail.google.com site and get its status along with the redirect URL:

$ curl --silent --output /dev/null --write-out "%{http_code} %{redirect_url}"  https://mail.google.com
301 https://mail.google.com/mail/

We can notice that the site responded with the 301 status code and redirected to a different site. Further, we captured this information by using the %{http_code} and %{redirect_url} variables.

Next, let’s follow the chain of redirects one by one until we visit the final site that doesn’t do a redirect:

$ curl --silent --output /dev/null --write-out "%{http_code} %{redirect_url}" https://mail.google.com/mail/
302 https://mail.google.com/mail/u/0/
$ curl --silent --output /dev/null --write-out "%{http_code} %{redirect_url}" https://mail.google.com/mail/u/0/
302 https://accounts.google.com/ServiceLogin?service=mail&passive=1209600&osid=1&continue=https://mail.google.com/mail/u/0/&followup=https://mail.google.com/mail/u/0/&emr=1
$ curl --silent --output /dev/null --write-out "%{http_code} %{redirect_url}" 'https://accounts.google.com/ServiceLogin?service=mail&passive=1209600&osid=1&continue=https://mail.google.com/mail/u/0/&followup=https://mail.google.com/mail/u/0/&emr=1'
302 https://accounts.google.com/v3/signin/identifier?dsh=S67422161%3A1676602258439305&continue=https%3A%2F%2Fmail.google.com%2Fmail%2Fu%2F0%2F&emr=1&followup=https%3A%2F%2Fmail.google.com%2Fmail%2Fu%2F0%2F&osid=1&passive=1209600&service=mail&flowName=WebLiteSignIn&flowEntry=ServiceLogin&ifkv=AWnogHendrSg9wPFPOD8ljqa1KB6WjR5HuK94WxyGXYstBX8krhrNKkzRiURjcVfSR5GentbusVduw
$ curl --silent --output /dev/null --write-out "%{http_code} %{redirect_url}" 'https://accounts.google.com/v3/signin/identifier?dsh=S67422161%3A1676602258439305&continue=https%3A%2F%2Fmail.google.com%2Fmail%2Fu%2F0%2F&emr=1&followup=https%3A%2F%2Fmail.google.com%2Fmail%2Fu%2F0%2F&osid=1&passive=1209600&service=mail&flowName=WebLiteSignIn&flowEntry=ServiceLogin&ifkv=AWnogHendrSg9wPFPOD8ljqa1KB6WjR5HuK94WxyGXYstBX8krhrNKkzRiURjcVfSR5GentbusVduw'

Although we could get the HTTP status for the target site, we had to make five connection requests explicitly using the redirect URL. It’s pretty error-prone.

Finally, let’s use the –location option to let curl manage the entire chain of URL redirects internally:

$ curl --silent --output /dev/null --write-out "%{http_code}" --location 'https://mail.google.com'

Perfect! This time, we got the same result, but more reliably.

6. Conclusion

In this article, we learned how to get the HTTP status code of a site using cURL. Further, we learned the significance of different options available with the curl command, such as –head, –include, –write-out, and –location.

Comments are closed on this article!