1. Overview

In this tutorial, we’ll see how to check the proper functioning of a website using a Bash script run manually or via cron or watch. If this check detects any of the fault conditions provided, we’ll automatically receive an email.

This is valuable when we administer websites. First, however, let’s clarify what we’re interested in monitoring.

2. What the Script Monitors

A website can be unreachable for multiple causes:

  • the remote host isn’t responding because it’s offline, overloaded, under attack, misconfigured, blocked by the provider, under maintenance, or for other causes related to network configuration
  • the web server responds but returns an HTTP status code other than 2xx and 3xx
  • the web server responds with a correct HTTP status code (typically 200), but the output is different than expected
  • the web server responds with an expired SSL certificate

We’re going to write a complete script that reports all these issues.

3. Installing and Configuring the Required Software

Our Bash script needs external commands whose installation method depends on our Linux distribution. On a Debian-based system, this would be enough:

# apt install msmtp curl iputils-ping grep coreutils

Usually, most of these packages are already installed.

We’ll use msmtp to send emails, curl to transfer data, check HTTP status codes and expiration dates of SSL certificates, and ping (by iputils-ping) to check Internet connectivity. On the other hand, grep and cut (by coreutils) are additional commands helpful in filtering the output of the other commands. Finally, date (by coreutils) will help us in comparing dates.

Only msmtp requires manual configuration.

3.1. Choosing an SMTP Service for msmtp

We need a reliable SMTP server to send emails, as personal unknown SMTP servers are always blacklisted. But some well-known free SMTP services, such as Gmail, have security policies that stop email sending when the sender’s IP changes. In our tests, this problem doesn’t occur with alternative free services such as Zoho Mail.

So, let’s create a personal Zoho Mail account.

3.2. Configuring msmtp

Zoho Mail’s official documentation on SMTP configuration contains the basic information we need. Let’s create the file ~/.msmtprc and make it readable and writable only to the current user, as it’ll contain sensitive data:

$ touch ~/.msmtprc
$ chmod 600 ~/.msmtprc

It should contain the following code, in which we pretend that our username is [email protected] and the password is demopass1234:

# Set default values for all accounts
defaults
auth              on
tls               on
tls_trust_file    /etc/ssl/certs/ca-certificates.crt
logfile           /var/log/msmtp

# Zoho mail
account           zohomail
host              smtp.zoho.eu
port              587
from              [email protected]
user              [email protected]
password          demopass1234

# Set a default account
account default : zohomail

We can consult our tutorial to learn more about this configuration.

Let’s now create the log file, assigning it to the msmtp user and group. Both of them were created by the msmtp package installer:

# touch /var/log/msmtp
# chown msmtp:msmtp /var/log/msmtp
# chmod 660 /var/log/msmtp

It’s time to do a test by sending an email to ourselves, replacing [email protected] with our actual email:

$ echo -e "Subject: TEST EMAIL\n\nIf you received this email, msmtp is working!" | msmtp [email protected]

The log can help us to troubleshoot the msmtp configuration if we don’t receive this test email.

4. Checking Internet Connection

If a website is unreachable, our script should distinguish whether the remote host is offline or the Internet connection is absent. A simple strategy is to try pinging a well-known public IP, for instance, Google’s primary DNS 8.8.8.8. We also need a timeout, in this case, provided by the -w flag:

$ if ping -q -w 1 -c 1 8.8.8.8 > /dev/null 2>&1; then
    echo "Internet connectivity OK" # stdout
else
    >&2 echo "Internet connectivity not available" #stderr
fi

This way, the ping‘s output is invisible because it goes to /dev/null, while the if condition evaluates only the ping‘s return value. The first echo sends the string to stdout and the second to stderr.

5. Checking HTTP Code and Response Content

Again we need a timeout, which we set to five seconds, thanks to the –max-time flag of curl. This refers to the time we authorize the entire operation so that the web page has time to be completely downloaded.

The following script executes a single request to the server, whereby it stores in $HTTPCODE the HTTP Code returned by the server, and in $CONTENT the data received, which is usually HTML code.

The two different outputs $HTTPCODE and $CONTENT from a single curl command are possible using the –write-out %{response_code} option, which sends the HTTP Code to stdout, and the –output “$STDOUTFILE” option, which redirects the downloaded content into the temporary $STDOUTFILE file. Its content is then stored in $CONTENT by CONTENT=$(<$STDOUTFILE):

$ WEBPAGE="https://www.google.com/" # web page to be loaded
$ STDOUTFILE=".tempCurlStdOut" # temp file to store stdout
$ > $STDOUTFILE # cleans the file content

$ HTTPCODE=$(curl --max-time 5 --silent --write-out %{response_code} --output "$STDOUTFILE" "$WEBPAGE")
$ CONTENT=$(<$STDOUTFILE) # if there are no errors, this is the HTML code of the web page

$ echo "HTTP CODE: "$HTTPCODE
$ echo "CONTENT LENGTH: "${#CONTENT}" chars" # HTML length

If curl fails to download the given page, the HTTP Code is 000.

Also, trying to replace www.google.com with www.baeldung.com, we get the HTTP Code 403 due to Cloudflare’s security protection. In such situations, the site administrator can put the IP from which the script executes in Cloudflare’s allowlist. A similar solution applies to other types of protections.

5.1. Expected HTTP Code

Typically we expect the monitored servers to return the HTTP Code 200 if all is well or other codes if there are problems:

$ if test $HTTPCODE -eq 200; then
    echo "HTTP STATUS CODE $HTTPCODE -> OK" # stdout
else
    >&2 echo "HTTP STATUS CODE $HTTPCODE -> Has something gone wrong?" #stderr
fi

This applies correctly when the site is unreachable because curl returns 000 as the HTTP Code. In addition, CMSs such as Drupal return 503 when under maintenance. So, in most cases, a simple check like this is enough.

5.2. Expected Website Content

Some websites expose special-purpose health check URLs, usually known only to administrators and invisible to search engines, but publicly accessible with any browser. Opening these URLs causes the server to perform periodic tasks such as checking for updates, indexing content, or sending email notifications to users. Sometimes there’s no output, while in other cases there’s a log that our script can read.

For instance, JomSocial, a social networking module for Joomla, has an URL that outputs an XML code:

[...]
<message>Could not convert video</message>
<message>No temporary videos to delete.</message>
<message>No files to transfer.</message>
<message>No Videos to transfer.</message>
[...]

In this case, the presence of the text “Could not convert” indicates an error that our script could report.

So, let’s add $NOTWANTEDCONTENT to indicate a text string that shouldn’t be inside $CONTENT. We perform a case insensitive check using the -i flag of grep:

$ NOTWANTEDCONTENT="Could not convert" # leave the string empty to disable this check
$ if ! test -z "$NOTWANTEDCONTENT"; then # check if the string is not empty
    if echo "$CONTENT" | grep -iq "$NOTWANTEDCONTENT"; then # case insensitive check
        >&2 echo "Not wanted content '$NOTWANTEDCONTENT'" # stderr
    fi
fi

In other situations, these hidden pages perform an internal check to let the administrator know if everything’s okay, outputting a simple plain text like this:

I'm working! :-)

So, let’s add $REQUIREDCONTENT to indicate a required content whose absence is an error condition:

$ REQUIREDCONTENT="I'm working! :-)" # leave the string empty to disable this check
$ if ! test -z "$REQUIREDCONTENT"; then # check if the string is not empty
    if ! echo "$CONTENT" | grep -iq "$REQUIREDCONTENT"; then # case insensitive check
        >&2 echo "Required content '$REQUIREDCONTENT' is absent" # stderr
    fi
fi

These types of checks are open to unlimited customizations and use cases. For example, we could put information into HTML comments intended to be intercepted by our script.

6. Checking SSL Expiration

An expired SSL certificate is a relatively frequent cause of non-availability of a website. This is especially true for those sites using short-expiration SSL certificates, such as those from Let’s Encrypt, which are very popular.

We can’t get the SSL certificate expiration date through the curl above, as we need the –verbose and –head options, which are incompatible with the ones we used previously to get the HTTP Code and page content. Let’s then execute a new request:

$ curl --verbose --head "$WEBPAGE"
[...]
* Connected to www.google.com (142.250.184.100) port 443 (#0)
[...]
* Server certificate:
*  subject: CN=www.google.com
*  start date: Aug 22 08:25:28 2022 GMT
*  expire date: Nov 14 08:25:27 2022 GMT
*  subjectAltName: host "www.google.com" matched cert's "www.google.com"
*  issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1C3
*  SSL certificate verify ok.
[...]

To make piping possible, we need the –stderr – option which ensures that all output is streamed to stdout only. Moreover, let’s not forget a timeout with –max-time. Afterward, we can filter the result with grep and cut to get the certificate expiration date in GMT format. Finally, date can convert it into a UNIX timestamp, which is helpful for comparison. Let’s pipe these all in a one-row solution:

$ EXPIREDATE=$(curl --max-time 5 --verbose --head --stderr - "$WEBPAGE" | grep "expire date" | cut -d":" -f 2- | date -f - "+%s")

Out of curiosity, we can see the value of $EXPIREDATE and convert it to a human-readable date:

$ echo $EXPIREDATE
1668414327
$ date -d @$EXPIREDATE
lun 14 nov 2022, 09:25:27, CET

Let’s have our script alert us a week in advance of the expiration date:

$ DAYS=$(( ($EXPIREDATE - $(date "+%s")) / (60*60*24) )) # days remaining to expiration
$ if test $DAYS -gt 7; then
    echo "No need to renew the SSL certificate. It will expire in $DAYS days." # stdout
else
    if test $DAYS -gt 0; then
        >&2 echo "The SSL certificate should be renewed as soon as possible ($DAYS remaining days)." # stderr
    else
        >&2 echo "The SSL certificate IS ALREADY EXPIRED!" # stderr
    fi
fi

Now, we have all the code pieces to create a complete script.

7. Putting It All Together

To make our script as adaptable as possible, let’s make it callable in this way:

$ ./websiteChecker URL [OPTION...]

Let’s look at the short and long options:

  • -n=str or –notWantedContent=str → indicates a text string str that shouldn’t be inside the webpage content
  • -r=str or –requiredContent=str → indicates a required text string str inside the webpage content
  • -e=address or –email=address → sends an email in case of error to the given address
  • -s or –silent → drops any output, useful when used with -e=address or –email=address

The only mandatory parameter is the URL, which can refer to a page with HTTP or HTTPS protocol. Only in the latter case, of course, the script checks the SSL certificate. We can specify options in any order. To avoid parsing errors, we must enclose str strings between double quotes if they contain spaces.

7.1. Final Script

Let’s save the following source code in a file named websiteChecker. Some major additions to what we’ve already seen are the case statement to parse arguments and the use of functions:

#!/bin/bash

trap "exit 1" TERM
export TOP_PID=$$
STDOUTFILE=".tempCurlStdOut" # temp file to store stdout
> $STDOUTFILE # cleans the file content

# Argument parsing follows our specification
for i in "$@"; do
  case $i in
    http*)
      WEBPAGE="${i#*=}"
      shift
      ;;
    -n=*|--notWantedContent=*)
      NOTWANTEDCONTENT="${i#*=}"
      shift
      ;;
    -r=*|--requiredContent=*)
      REQUIREDCONTENT="${i#*=}"
      shift
      ;;
    -e=*|--email=*)
      EMAIL="${i#*=}"
      shift
      ;;
    -s|--silent)
      SILENT=true
      shift
      ;;
    *)
      >&2 echo "Unknown option: $i" # stderr
      exit 1
      ;;
    *)
      ;;
  esac
done

if test -z "$WEBPAGE"; then
    >&2 echo "Missing required URL" # stderr
    exit 1;
fi

function stdOutput { 
    if ! test "$SILENT" = true; then
        echo "$1"
    fi
}

function stdError { 
    if ! test "$SILENT" = true; then
        >&2 echo "$1" # stderr
    fi
    if ! test -z "$EMAIL"; then
        echo -e "Subject: $WEBPAGE is not working\n\nThe error is: $1" | msmtp $EMAIL
    fi
    kill -s TERM $TOP_PID # abort the script execution
}

if ping -q -w 1 -c 1 8.8.8.8 > /dev/null 2>&1; then
    stdOutput "Internet connectivity OK"
    HTTPCODE=$(curl --max-time 5 --silent --write-out %{response_code} --output "$STDOUTFILE" "$WEBPAGE")
    CONTENT=$(<$STDOUTFILE) # if there are no errors, this is the HTML code of the web page
    if test $HTTPCODE -eq 200; then
        stdOutput "HTTP STATUS CODE $HTTPCODE -> OK"
    else
        stdError "HTTP STATUS CODE $HTTPCODE -> Has something gone wrong?"
    fi
    if ! test -z "$NOTWANTEDCONTENT"; then
        if echo "$CONTENT" | grep -iq "$NOTWANTEDCONTENT"; then # case insensitive check
            stdError "Not wanted content '$NOTWANTEDCONTENT'"
        fi
    fi
    if ! test -z "$REQUIREDCONTENT"; then
        if ! echo "$CONTENT" | grep -iq "$REQUIREDCONTENT"; then # case insensitive check
            stdError "Required content '$REQUIREDCONTENT' is absent"
        fi
    fi
    if echo "$WEBPAGE" | grep -iq "https"; then # case insensitive check
        EXPIREDATE=$(curl --max-time 5 --verbose --head --stderr - "$WEBPAGE" | grep "expire date" | cut -d":" -f 2- | date -f - "+%s")
        DAYS=$(( ($EXPIREDATE - $(date "+%s")) / (60*60*24) )) # days remaining to expiration
        if test $DAYS -gt 7; then
            stdOutput "No need to renew the SSL certificate. It will expire in $DAYS days."
        else
            if test $DAYS -gt 0; then
                stdError "The SSL certificate should be renewed as soon as possible ($DAYS remaining days)."
            else
                stdError "The SSL certificate IS ALREADY EXPIRED!"
            fi
        fi
    fi
else
    >&2 echo "Internet connectivity not available" #stderr
    exit 1
fi

The script requires that msmtp has already been configured.

7.2. Examples and Recommendations

Let’s add the execution permissions:

$ chmod +x ./websiteChecker

Then, let’s try three examples:

$ ./websiteChecker http://www.google.com
Internet connectivity OK
HTTP STATUS CODE 200 -> OK

$ ./websiteChecker https://www.google.com
Internet connectivity OK
HTTP STATUS CODE 200 -> OK
No need to renew the SSL certificate. It will expire in 65 days.

$ ./websiteChecker [email protected] -n="Giulio Ripa" -r="Francesco Galgani" https://www.informatica-libera.net
Internet connectivity OK
HTTP STATUS CODE 200 -> OK
Not wanted content 'Giulio Ripa'

In the latter case, we received an email notification.

For continuous execution at regular intervals, we can use cron or watch. In such cases, it’s better to prevent websiteChecker from running as root. It’s easy because each user has their own crontab.

Integration with logrotate might also be helpful to avoid accumulating too much logs.

7.3. Further Development

We could further develop the script in two directions.

The first is adding more reporting methods than email, such as SMS, socials, ad-hoc apps, and others. This is usually possible by making a REST call to third-party services via curl.

The second is monitoring additional possible causes of failure, such as domain non-renewal, blacklisting by authorities like Google, or blocking by the deceptive content and dangerous software protection feature of the browser. REST APIs of third-party services can detect these issues.

8. Conclusion

Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.