Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: May 4, 2024
When working in a shell environment, there are times when we need to verify whether a URL is valid before we proceed to use it for other operations.
In this tutorial, we’ll learn some simple and effective ways to check if a URL exists, directly from our shell.
Within the shell, we have two primary tools at our disposal to check URL existence. These tools are curl and wget.
curl is a command line tool that we can use to transfer data to or from servers using various protocols (including HTTP and HTTPS). Among its many features, we can use curl to check if a URL points to an actual, accessible resource.
Now, let’s explore a few different ways to use curl.
First, let’s see a simple script to check if a URL exists using curl:
#!/bin/bash
if curl --head --silent http://www.baeldung.com/ > /dev/null 2>&1; then
echo "URL exists"
else
echo "URL doesn't exist or isn't reachable"
fi
Let’s break down the key part of this script:
In addition, the if statement checks the exit code of the curl command. If it’s 0 (success), the URL exists and is reachable. But, if the exit code is non-zero (error), the URL may not exist or there may be a connection issue.
Next, let’s see how we can store the HTTP response code in a variable for further error handling:
#!/bin/bash
result=$(curl --head --silent --write-out "%{http_code}" --output /dev/null https://www.google.com/)
if [[ $result -eq 200 ]]; then
echo "URL exists"
else
echo "URL doesn't exist or is not reachable"
fi
In the code block above, the key addition is –write-out “%{http_code}”. This tells curl to include the website’s HTTP status code in its output. Then, we capture this output in the result variable.
In addition, the if statement checks the value of the result variable. If it’s 200, it means the URL exists. However, if result is not 200, there might be a problem or the URL may not exist.
wget is a tool that we can use to download files from the web. It also provides a convenient way to verify URL existence.
Now, let’s see how wget works:
#!/bin/bash
if wget --spider https://www.facebook.com/ > /dev/null 2>&1; then
echo "URL exists"
else
echo "URL does not exist or is not reachable"
fi
In the above script:
As with curl, wget uses exit codes to communicate the results of an operation. Generally, a 0 exit code means success (the URL exists), while other codes indicate an error condition or non-existence of the URL.
Also, we can completely mutate the output from wget, resulting in a cleaner script execution. Let’s find out how this option works:
#!/bin/bash
if wget --spider -q https://www.google.com; then
echo "URL exists"
else
echo "URL does not exist or is not reachable"
fi
In the above example, -q instructs wget to run in quiet mode without printing any output to the console. The rest of the script works the same way as the last example.
Therefore, while wget is primarily for downloading files, its –spider mode is a quick and easy way to confirm if a website or resource exists.
Simply knowing whether a URL exists is great. But sometimes, we may need our script to act differently depending on the reason a URL check fails. We may also want our scripts to handle situations where websites are simply slow to respond.
Now, let’s learn about ways to make our scripts more adaptable to these situations.
Websites tell current conditions using HTTP status codes. So, by understanding these codes, we can tweak our scripts to make informed decisions about how to proceed.
Let’s learn about a few commonly encountered HTTP status codes:
In addition to these standard codes, you might encounter a 000 status code. This code isn’t standard and it’s used by tools like curl to indicate that no HTTP response was received.
A 000 status code could occur when there’s a network timeout, DNS issue, or a connection drop before the server could respond.
For example, let’s create a script that handles some common HTTP codes and the 000 code:
#!/bin/bash
url="https://www.google.com"
status_code=$(curl --head --silent --output /dev/null --write-out '%{http_code}' "$url")
case $status_code in
200)
echo "URL exists"
;;
404)
echo "Error 404: Not found."
;;
403)
echo "Error 403: Forbidden."
;;
301)
echo "Error 301: Moved permanently."
;;
000)
echo "No response received."
;;
*)
echo "Unexpected status code: $status_code. Further troubleshooting needed"
;;
esac
Now, let’s consider the above code snippet:
Furthermore, by handling HTTP errors correctly, we can turn simple URL existence checkers into smart scripts that act differently depending on the situation.
When working with URLs, we may come across slow or unresponsive websites. However, by setting timeouts in our scripts, we can ensure the scripts don’t hang indefinitely.
Let’s see a simple example:
#!/bin/bash
url="https://www.google.com"
response=$(curl --connect-timeout 10 --max-time 15 --silent --head --write-out "%{http_code}" --output /dev/null "$url")
echo "HTTP status code: $response"
In the above example:
Hence, timeout helps to keep our scripts from getting stuck on slow or unresponsive websites. Also, by using curl‘s –connect-timeout and –max-time options, we can make scripts operate more smoothly.
In this article, we explored how to verify if URLs exist from the shell environment. Using tools like curl and wget, we created scripts that check website availability.
Also, we learned about HTTP status codes, which enables us to build even smarter scripts. This script responds differently to status codes like missing pages (404) or access restrictions (403).
Finally, by setting timeouts, we ensured our script didn’t freeze up when websites were slow or unresponsive.
These skills are essential for automating tasks, building monitoring tools, or creating more efficient scripts to interact with web resources.