Baeldung Pro – Linux – NPI EA (cat = Baeldung on Linux)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

Partner – Orkes – NPI EA (tag=Kubernetes)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

1. Overview

While the primary purpose of wget is to download resources, it can also check the status of a URL without downloading its content. For example, we can use wget to test links, automate validation of the links in scripts, and monitor the availability of a web server.

In this tutorial, we’ll explore how to use wget to check the status of a URL, interpret the results, and include these checks in our Linux workflow.

2. URL Status Checks With wget

Sometimes, we may need to verify whether a URL is accessible without downloading its content. For instance, we can use it to answer the following questions:

  • Does the URL return a 200 OK response?
  • Does the server return an error, for instance, 404 Not Found or 500 Internal Server Error?
  • Are there any redirects, and where do they lead?

We can use wget to answer questions such as those above.

3. Using the –spider Option

The –spider option ensures wget behaves like a web crawler. To explain, the –spider option enables wget to send HTTP requests to the server but not save any files. As a result, we can easily check whether a URL is reachable or returns an error:

$ wget --spider [URL]

Above, we display the simple syntax we need to implement. Now, let’s work with practical examples:

$ wget --spider https://example.com
Spider mode enabled. Check if remote file exists.
--2024-12-30 00:51:46--  https://example.com/
Resolving example.com (example.com)... 93.184.215.14, 2606:2800:21f:cb07:6820:80da:af6b:8b2c
Connecting to example.com (example.com)|93.184.215.14|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1256 (1.2K) [text/html]
Remote file exists and could contain further links,
but recursion is disabled -- not retrieving.

The server responds with the status code 200 OK, indicating that the URL is accessible and the remote file exists. Additionally, the output displays the file’s length (1.2KB) and type (text/html). The notification “but recursion is disabled — not retrieving” informs us that since recursion (following links within the remote file) is disabled, wget only confirms the file’s existence and doesn’t retrieve its content.

Let’s see what happens when we use a URL that points to a non-existent resource on the same domain:

$ wget --spider https://example.com/nonexistent
Spider mode enabled. Check if remote file exists.
--2024-12-30 20:11:32--  https://example.com/nonexistent
Resolving example.com (example.com)... 93.184.215.14, 2606:2800:21f:cb07:6820:80da:af6b:8b2c
Connecting to example.com (example.com)|93.184.215.14|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
Remote file does not exist -- broken link!!!

The server responds with the status code 404 Not Found, indicating that the URL doesn’t exist. Meanwhile, the line “Remote file does not exist — broken link!!!” indicates the link is invalid or the file is inaccessible because it has been removed.

4. Focusing on HTTP Status Codes

The –spider option gives us a detailed output of wget. However, we can modify the command to focus specifically on the HTTP status codes, with the help of the –server-response option:

$ wget --spider --server-response https://example.com 2>&1 | grep "HTTP/"
  HTTP/1.1 200 OK

Here’s the breakdown of the modification:

  • –server-response – outputs the HTTP headers from the server response
  • 2>&1 – ensures all output, even error messages or server responses that need to be sent to stderr, is merged with the normal output
  • grep “HTTP/” – the grep command filters and displays only the HTTP status line (HTTP/1.1 200 OK) irrespective of whether it was part of the stdout or stderr

Next, let’s show an example of a redirect:

$ wget --spider --server-response https://httpstat.us/301 2>&1 | grep "HTTP/"
  HTTP/1.1 301 Moved Permanently
  HTTP/1.1 200 OK

The two results appear because the URL (https://httpstat.us/301) initiates an HTTP redirection, resulting in the initial response (301 Moved Permanently). Then, wget automatically follows the redirect to the new location, resulting in the second response (200 OK). However, we can capture the 301 response directly:

$ wget --spider --server-response --max-redirect=0 https://httpstat.us/301 2>&1 | grep "HTTP/"
  HTTP/1.1 301 Moved Permanently

The –max-redirect option prevents wget from following any HTTP redirects. In turn, the output contains only the initial response (301 Moved Permanently).

Further, let’s demonstrate the error 500 Internal Server Error:

$ wget --spider --server-response https://httpstat.us/500 2>&1 | grep "HTTP/"
  HTTP/1.1 500 Internal Server Error
  HTTP/1.1 500 Internal Server Error

The two identical HTTP/1.1 500 Internal Server Error results are because wget, by default, retries requests multiple times once it encounters a failure. In this case, it encountered a server error (status code 500). However, we can limit the number of retries using the –tries=1 option:

$ wget --spider --server-response --tries=1 https://httpstat.us/500 2>&1 | grep "HTTP/"
  HTTP/1.1 500 Internal Server Error

Now the error only appears once.

5. Automating URL Status Checks in Scripts

Let’s use what we’ve learned in shell scripts to automate URL status checks.

First, let’s create the urls.txt file to hold multiple URLs:

https://example.com
https://httpstat.us/404
https://httpstat.us/500

Next, let’s create the Bash script check_urls.sh and add:

#!/bin/bash

input_file="urls.txt"
log_file="status_log.txt"

while read -r url; do
  echo "Checking $url..."
  status=$(wget --spider --server-response "$url" 2>&1 | grep "HTTP/" | tail -1)
  echo "$url - $status" >> "$log_file"
done < "$input_file"

echo "Status check completed. Results saved to $log_file."

This is the breakdown:

  • input_file=”urls.txt” – path to the file holding the list of URLs
  • log_file=”status_log.txt” – path to the file containing the results of the status checks
  • while read -r url; do – reads each line from the input_file in order; -r ensures backslashes in the lines are treated literally instead of escape characters
  • echo “Checking $url…” – notifies us that the script is currently checking the given URL
  • status=$(…) – captures the output of the command inside the parentheses and stores it in the status variable
  • echo “$url – $status” >> “$log_file” – appends each URL along with its status to the log_file
  • done < “$input_file” – loops through all the lines in input_file
  • echo “Status check completed. Results saved to $log_file.” – informs that all URLs have been processed and the results can be found in the log_file

The command inside status=$(…) sends a request to each URL and extracts the HTTP status lines from the server’s response using grep “HTTP/”. It then utilizes tail -1 to ensure only the last HTTP/ line is captured if there happens to be multiple redirects or responses.

At this point, let’s see the results of the Bash script:

$ bash check_urls.sh
Checking https://example.com...
Checking https://httpstat.us/404...
Checking https://httpstat.us/500...
Status check completed. Results saved to status_log.txt.

Further, let’s display the content of the status_log.txt using the cat command:

$ cat status_log.txt
https://example.com -   HTTP/1.1 200 OK
https://httpstat.us/404 -   HTTP/1.1 404 Not Found
https://httpstat.us/500 -   HTTP/1.1 500 Internal Server Error

The file contains URLs and the results of their status checks.

6. Conclusion

In this article, we learned how to use wget to check the status of a URL without downloading its content.

First, we explored basic status checks using the –spider option. Next, we discussed status checks by focusing on HTTP status codes with the help of the –server-response option. After that, we looked at automating URL status checks in scripts.

Thus, we can utilize the wget command to manage and monitor URLs, whether we’re validating links, monitoring server health, or automating tasks into our Linux workflow.