
Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: November 12, 2024
The wget command-line tool enables us to download files from the internet. By default, wget downloads a file even if it already exists locally. However, to save time, bandwidth, and storage, we can avoid downloading the same file repeatedly. Fortunately, wget supports checking for a file locally before downloading it again.
In this tutorial, we’ll explore how to ensure that wget only downloads a file that doesn’t exist.
The wget command follows a simple syntax:
$ wget [URL]
For instance, let’s use it to download an image:
$ wget https://source1.com/images/pic.png
...
Length: 224566 (219K) [image/png]
Saving to: ‘pic.png’
pic.png 100%[=================================================================>] 219.30K 556KB/s in 0.4s
2024-11-03 02:43:32 (556 KB/s) - ‘pic.png’ saved [224566/224566]
This command downloads pic.png into the current directory. If pic.png already exists, wget detects the existing file and creates a new copy with a numbered suffix, for instance, pic.png.1, pic.png.2, and so on. This can be inefficient, since downloading the same file repeatedly wastes bandwidth and storage space.
Now, wget contains flags that we can use to modify its default behavior. Therefore, let’s discuss flags we can utilize to ensure that wget skips downloading the file if it already exists.
The -nc or –no-clobber flag in wget prevents overwriting an existing file with the same name:
$ wget -nc https://source1.com/images/pic.png
File ‘pic.png’ already there; not retrieving.
Above, –nc stops wget from downloading pic.png if a file with the same name already exists in the current directory. However, it only checks the file’s name, not whether its content has changed. Thus, wget -nc won’t download pic.png, even if there’s a newer version on the server. As a result, it hinders us from always downloading the latest version of the file.
This makes -nc useful if we never want to overwrite the file.
The -N or –timestamping flag helps us download the latest version of the file. To clarify, the -N flag enables us to download the file if it doesn’t exist in the current directory or the server version of the file is newer than the local version:
$ wget -N https://source1.com/images/pic.png
If the file exists locally, -N instructs wget to compare the timestamp for the file on the server against the timestamp for the file in the current working directory:
$ wget -N https://source1.com/images/pic.png
...
File ‘pic.png’ not modified on server. Omitting download.
Above, wget skips downloading pic.png because the server version is not newer.
Therefore, we can use the -N flag to work with files that are updated over time.
Besides flags, we can use conditional logic in shell scripts to check if the file exists before using wget. This gives the administrator more control over whether the file should be downloaded.
To demonstrate, let’s create the Bash script download_file_if_missing.sh and paste:
#!/bin/bash
FILE="pic.png"
URL="https://source1.com/images/$FILE"
if [ -f "$FILE" ]; then
echo "$FILE is already present. Let's skip the download."
else
wget "$URL"
fi
This Bash script checks for the file locally before downloading it:
Let’s run the Bash script:
$ bash download_file_if_missing.sh
pic.png is already present. Let's skip the download.
In this example, pic.png already exists locally.
There are situations in which we may download files with the same name from different URLs.
First, we can use the -O and flag to rename the files:
$ wget -nc -O source2_pic.png https://source2.com/images/pic.png
Here, we use the -O flag to define a unique filename depending on the URL. In this case, we download the file pic.png from the different URL as source2_pic.png. Additionally, we can use the -P flag to organize files by source directories:
$ wget -nc -P source2 https://source2.com/images/pic.png
Above, we download pic.png from the different URL and store it in the directory source2. If the source2 directory doesn’t exist, the command creates it automatically. Notably, we can apply -O and -P when working with both -nc and -N flags.
Next, let’s handle files with identical names from different sources when working with conditional logic in scripts. To achieve this, let’s modify the Bash script download_file_if_missing.sh:
#!/bin/bash
# Specify filename and new URL
FILE="pic.png"
URL="https://source2.com/images/$FILE"
# Capture domain name from new URL
DOMAIN=$(echo "$URL" | awk -F[/:] '{print $4}')
# Create a unique file name (e.g., source2.com_pic.png) based on the domain name
NEW_FILE_NAME="${DOMAIN}_${FILE}"
# Check whether $NEW_FILE_NAME already exists in the current directory
if [ -f "$NEW_FILE_NAME" ]; then
echo "$NEW_FILE_NAME is already present. Let's skip the download."
else
# Download the file and renames it to $NEW_FILE_NAME
wget -O "$NEW_FILE_NAME" "$URL"
fi
Here, we change wget “$URL” to wget -O “$NEW_FILE_NAME” “$URL”. Now, the script downloads pic.png but renames it by including a unique identifier in the filename:
This modification helps to differentiate between the similar files pic.png from the two different URLs.
In this article, we explored various ways we can work with wget to avoid redundant downloads.
First, we explored the wget flags -nc to discard existing files and -N to check for timestamps. Next, we used conditional logic in shell scripts to check if a file exists before using wget. We then handled files with identical names from different sources.
We can now download files more efficiently by saving bandwidth and keeping our local storage more organized.