GNU Wget is a standard tool for downloading data from a web server. However, we can also use it to detect downtime of a site by running it periodically in a cron job. Moreover, we aren’t interested in downloading the data when using wget for such a use case, so we may want to ignore it altogether.
In this tutorial, we’ll learn how to direct the output of the wget command to /dev/null in cron.
2. Understanding the Scenario
In this section, we’ll simulate the scenario that makes it necessary for us to direct the output of wget to /dev/null.
2.1. Repeated Execution of wget Command
Let’s start by repeatedly executing the wget command five times:
$ seq 5 | xargs -n1 wget https://www.google.com --2023-04-09 12:07:14-- https://www.google.com/ Resolving www.google.com (www.google.com)... 126.96.36.199, 2404:6800:4002:81c::2004 Connecting to www.google.com (www.google.com)|188.8.131.52|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: 'index.html.6' index.html.6 [ <=> ] 16.14K --.-KB/s in 0.004s 2023-04-09 12:07:15 (4.07 MB/s) - 'index.html.6' saved  --2023-04-09 12:07:15-- http://1/ Resolving 1 (1)... 0.0.0.1 Connecting to 1 (1)|0.0.0.1|:80... failed: Connection refused. FINISHED --2023-04-09 12:07:15-- Total wall clock time: 1.7s Downloaded: 1 files, 16K in 0.004s (4.07 MB/s) # output trimmed to a single execution
Now, we’ll find that a new file is generated for each execution of wget, each with a filename prefix of index.html:
$ ls -1 index.html* index.html index.html.1 index.html.2 index.html.3 index.html.4
Next, let’s see if we can redirect the output to /dev/null:
$ wget https://www.google.com &>/dev/null # no output on stdout
We made progress, as there is no output on stdout. However, let’s also verify if there are any files generated:
$ ls -1 index.html* index.html
Unfortunately, the redirection didn’t help prevent the creation of an output document. That’s because of the default behavior of the wget command that sends the output only to a file instead of stdout.
2.2. Running wget Using cron
First, let’s set up a cron job using the -e option of the crontab command:
$ crontab -e
This opens an editor for us where we can add the schedule for our cron job.
Now, let’s verify the schedule of our cron job by using the -l option of the crontab command:
$ crontab -l | grep -v -E 'wget' -E '^#' * * * * * wget https://www.google.com
The output confirms that our cron job will run every minute. Moreover, we must note that we used the -v option of the grep command to exclude all the comments starting with the # character.
Next, let’s see the impact of the cron job after an hour by inspecting the number of files generated on our filesystem:
# ls index* | wc -l 60
As expected, the cron job created a new file every minute. Furthermore, we can anticipate that cron jobs using the wget command can put a lot of pressure on our filesystem in the longer run.
Finally, let’s also see the impact of the cron job on the incoming mail because cron jobs send an email to its owner if the job execution produces an output:
$ mail "/var/mail/root": 60 messages 60 new # output trimmed
We must realize that such an email is an unnecessary noise for the user. So, we must solve this issue by directing the output to /dev/null.
3. Using –output-document and –output-file Options
We can use the –output-document and –output-file options of the wget command to direct the output document and diagnostic information, respectively.
Let’s see this in action in a standalone run of the wget command by using /dev/null as the target for all redirections:
$ wget --output-document /dev/null --output-file /dev/null https://www.google.com $ echo $? 0
As expected, we don’t see any output on stdout. Further, we’ve also checked that the execution of the command was successful.
Now, let’s also verify the presence of the output document:
$ [ -f index.html ] ; echo $? 1
Great! We’ve got it right this time, as the filesystem has no additional output document from the wget command.
Next, let’s use this learning to revise our cron job:
$ crontab -l | grep -v -E '^#' * * * * * wget --output-document /dev/null -output-file /dev/null https://www.google.com
Finally, let’s wait for a minute to ensure that the cron job is not producing the output document anymore:
$ sleep 60; [ -f index.html ] ; echo $? 1
As expected, the exit status is non-zero, indicating that the file named index.html is absent.
4. Using –spider and –quiet Options
Interestingly, wget supports the –spider option to mimic the behavior of a web spider. Using this behavior, we can limit wget to only visiting the URL without downloading it.
Let’s analyze the behavior of wget with the –spider option:
$ wget --spider https://www.google.com Spider mode enabled. Check if remote file exists. --2023-04-09 14:11:21-- https://www.google.com/ Resolving www.google.com (www.google.com)... 184.108.40.206, 2404:6800:4002:817::2004 Connecting to www.google.com (www.google.com)|220.127.116.11|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Remote file exists and could contain further links, but recursion is disabled -- not retrieving.
Although there is some noise in the output, we can verify that wget didn’t download the file:
$ [ -f index.html ] ; echo $? 1
Next, let’s go ahead and use the –quiet option to suppress the diagnostic logs:
$ wget --quiet --spider https://www.google.com $ echo $? 0
Great! We can use the –quiet and –spider options together to silence the output of the wget command in a cron job.
Finally, let’s revise the command in our cron job and verify the outcome:
$ crontab -e # revise the wget command crontab: installing new crontab $ crontab -l | grep -v -E '^#' * * * * * wget --quiet --spider https://www.google.com $ sleep 60; [ -f index.html ] ; echo $? 1
As expected, there is no output document after waiting for a minute. So, we can infer that the approach works as expected.
5. Using Output Redirections
In this section, we’ll explore different scenarios involving output redirection while using the wget command.
5.1. Using &>/dev/null
We can use the –output-document option along with – to direct the output document to stdout. Additionally, we can use the &> output redirection to direct both stdout and stderr to /dev/null:
$ wget --output-document - https://www.google.com &>/dev/null $ echo $? 0 $ [ -f index.html ] ; echo $? 1
Great! It looks like this output redirection strategy suppresses the output when using wget outside a cron job.
Now, let’s go ahead and verify if this works within a cron schedule:
$ crontab -l | grep -v -E '^#' * * * * * wget --output-document - https://www.google.com &>/dev/null $ sleep 60; [ -f index.html ] ; echo $? 1
The verification passed, so we have another strategy to solve our use case.
5.2. Using 1>/dev/null
Alternatively, we can use the –output-file and –output-document options together with – to direct the entire output to stdout. Then, we can use the 1>/dev/null output redirection to direct stdout to /dev/null:
$ wget --output-document - --output-file - https://www.google.com 1>/dev/null # same verification as earlier
Moving on, let’s revise the crontab entry:
$ crontab -e # revise wget command $ crontab -l | grep -v -E '^#' * * * * * wget --output-file - --output-document - https://www.google.com 1>/dev/null
Finally, let’s verify its functionality when running through the cron job:
$ sleep 60; [ -f index.html ] ; echo $? 1
Perfect! Our approach worked as expected.
In this article, we learned the significance of disabling the output of the wget command in cron. Furthermore, we explored different options of wget, such as –output-document, –output-file, and –spider, along with output redirection techniques to direct the output to /dev/null.