
Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: January 6, 2025
While the primary purpose of wget is to download resources, it can also check the status of a URL without downloading its content. For example, we can use wget to test links, automate validation of the links in scripts, and monitor the availability of a web server.
In this tutorial, we’ll explore how to use wget to check the status of a URL, interpret the results, and include these checks in our Linux workflow.
Sometimes, we may need to verify whether a URL is accessible without downloading its content. For instance, we can use it to answer the following questions:
We can use wget to answer questions such as those above.
The –spider option ensures wget behaves like a web crawler. To explain, the –spider option enables wget to send HTTP requests to the server but not save any files. As a result, we can easily check whether a URL is reachable or returns an error:
$ wget --spider [URL]
Above, we display the simple syntax we need to implement. Now, let’s work with practical examples:
$ wget --spider https://example.com
Spider mode enabled. Check if remote file exists.
--2024-12-30 00:51:46-- https://example.com/
Resolving example.com (example.com)... 93.184.215.14, 2606:2800:21f:cb07:6820:80da:af6b:8b2c
Connecting to example.com (example.com)|93.184.215.14|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1256 (1.2K) [text/html]
Remote file exists and could contain further links,
but recursion is disabled -- not retrieving.
The server responds with the status code 200 OK, indicating that the URL is accessible and the remote file exists. Additionally, the output displays the file’s length (1.2KB) and type (text/html). The notification “but recursion is disabled — not retrieving” informs us that since recursion (following links within the remote file) is disabled, wget only confirms the file’s existence and doesn’t retrieve its content.
Let’s see what happens when we use a URL that points to a non-existent resource on the same domain:
$ wget --spider https://example.com/nonexistent
Spider mode enabled. Check if remote file exists.
--2024-12-30 20:11:32-- https://example.com/nonexistent
Resolving example.com (example.com)... 93.184.215.14, 2606:2800:21f:cb07:6820:80da:af6b:8b2c
Connecting to example.com (example.com)|93.184.215.14|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
Remote file does not exist -- broken link!!!
The server responds with the status code 404 Not Found, indicating that the URL doesn’t exist. Meanwhile, the line “Remote file does not exist — broken link!!!” indicates the link is invalid or the file is inaccessible because it has been removed.
The –spider option gives us a detailed output of wget. However, we can modify the command to focus specifically on the HTTP status codes, with the help of the –server-response option:
$ wget --spider --server-response https://example.com 2>&1 | grep "HTTP/"
HTTP/1.1 200 OK
Here’s the breakdown of the modification:
Next, let’s show an example of a redirect:
$ wget --spider --server-response https://httpstat.us/301 2>&1 | grep "HTTP/"
HTTP/1.1 301 Moved Permanently
HTTP/1.1 200 OK
The two results appear because the URL (https://httpstat.us/301) initiates an HTTP redirection, resulting in the initial response (301 Moved Permanently). Then, wget automatically follows the redirect to the new location, resulting in the second response (200 OK). However, we can capture the 301 response directly:
$ wget --spider --server-response --max-redirect=0 https://httpstat.us/301 2>&1 | grep "HTTP/"
HTTP/1.1 301 Moved Permanently
The –max-redirect option prevents wget from following any HTTP redirects. In turn, the output contains only the initial response (301 Moved Permanently).
Further, let’s demonstrate the error 500 Internal Server Error:
$ wget --spider --server-response https://httpstat.us/500 2>&1 | grep "HTTP/"
HTTP/1.1 500 Internal Server Error
HTTP/1.1 500 Internal Server Error
The two identical HTTP/1.1 500 Internal Server Error results are because wget, by default, retries requests multiple times once it encounters a failure. In this case, it encountered a server error (status code 500). However, we can limit the number of retries using the –tries=1 option:
$ wget --spider --server-response --tries=1 https://httpstat.us/500 2>&1 | grep "HTTP/"
HTTP/1.1 500 Internal Server Error
Now the error only appears once.
Let’s use what we’ve learned in shell scripts to automate URL status checks.
First, let’s create the urls.txt file to hold multiple URLs:
https://example.com
https://httpstat.us/404
https://httpstat.us/500
Next, let’s create the Bash script check_urls.sh and add:
#!/bin/bash
input_file="urls.txt"
log_file="status_log.txt"
while read -r url; do
echo "Checking $url..."
status=$(wget --spider --server-response "$url" 2>&1 | grep "HTTP/" | tail -1)
echo "$url - $status" >> "$log_file"
done < "$input_file"
echo "Status check completed. Results saved to $log_file."
This is the breakdown:
The command inside status=$(…) sends a request to each URL and extracts the HTTP status lines from the server’s response using grep “HTTP/”. It then utilizes tail -1 to ensure only the last HTTP/ line is captured if there happens to be multiple redirects or responses.
At this point, let’s see the results of the Bash script:
$ bash check_urls.sh
Checking https://example.com...
Checking https://httpstat.us/404...
Checking https://httpstat.us/500...
Status check completed. Results saved to status_log.txt.
Further, let’s display the content of the status_log.txt using the cat command:
$ cat status_log.txt
https://example.com - HTTP/1.1 200 OK
https://httpstat.us/404 - HTTP/1.1 404 Not Found
https://httpstat.us/500 - HTTP/1.1 500 Internal Server Error
The file contains URLs and the results of their status checks.
In this article, we learned how to use wget to check the status of a URL without downloading its content.
First, we explored basic status checks using the –spider option. Next, we discussed status checks by focusing on HTTP status codes with the help of the –server-response option. After that, we looked at automating URL status checks in scripts.
Thus, we can utilize the wget command to manage and monitor URLs, whether we’re validating links, monitoring server health, or automating tasks into our Linux workflow.