Skip Downloading a File if the File Already Exists Using wget

1. Overview

The wget command-line tool enables us to download files from the internet. By default, wget downloads a file even if it already exists locally. However, to save time, bandwidth, and storage, we can avoid downloading the same file repeatedly. Fortunately, wget supports checking for a file locally before downloading it again.

In this tutorial, we’ll explore how to ensure that wget only downloads a file that doesn’t exist.

2. Skip Downloading a File if It Already Exists

The wget command follows a simple syntax:

$ wget [URL]

For instance, let’s use it to download an image:

$ wget https://source1.com/images/pic.png
...
Length: 224566 (219K) [image/png]
Saving to: ‘pic.png’

pic.png 100%[=================================================================>] 219.30K   556KB/s    in 0.4s    

2024-11-03 02:43:32 (556 KB/s) - ‘pic.png’ saved [224566/224566]

This command downloads pic.png into the current directory. If pic.png already exists, wget detects the existing file and creates a new copy with a numbered suffix, for instance, pic.png.1, pic.png.2, and so on. This can be inefficient, since downloading the same file repeatedly wastes bandwidth and storage space.

Now, wget contains flags that we can use to modify its default behavior. Therefore, let’s discuss flags we can utilize to ensure that wget skips downloading the file if it already exists.

2.1. Using the -nc Flag

The -nc or –no-clobber flag in wget prevents overwriting an existing file with the same name:

$ wget -nc https://source1.com/images/pic.png
File ‘pic.png’ already there; not retrieving.

Above, –nc stops wget from downloading pic.png if a file with the same name already exists in the current directory. However, it only checks the file’s name, not whether its content has changed. Thus, wget -nc won’t download pic.png, even if there’s a newer version on the server. As a result, it hinders us from always downloading the latest version of the file.

This makes -nc useful if we never want to overwrite the file.

2.2. Using the -N Flag

The -N or –timestamping flag helps us download the latest version of the file. To clarify, the -N flag enables us to download the file if it doesn’t exist in the current directory or the server version of the file is newer than the local version:

$ wget -N https://source1.com/images/pic.png

If the file exists locally, -N instructs wget to compare the timestamp for the file on the server against the timestamp for the file in the current working directory:

$ wget -N https://source1.com/images/pic.png
...
File ‘pic.png’ not modified on server. Omitting download.

Above, wget skips downloading pic.png because the server version is not newer.

Therefore, we can use the -N flag to work with files that are updated over time.

2.3. Using Conditional Logic in Scripts

Besides flags, we can use conditional logic in shell scripts to check if the file exists before using wget. This gives the administrator more control over whether the file should be downloaded.

To demonstrate, let’s create the Bash script download_file_if_missing.sh and paste:

#!/bin/bash

FILE="pic.png"
URL="https://source1.com/images/$FILE"

if [ -f "$FILE" ]; then
    echo "$FILE is already present. Let's skip the download."
else
    wget "$URL"
fi

This Bash script checks for the file locally before downloading it:

if [ -f “$FILE” ]; then – this line starts a conditional if statement; the -f flag checks whether the file is a regular file; if FILE, in this case, pic.png exists in the current directory, then the condition evaluates to true
echo “$FILE is already present. Let’s skip the download.” – if $FILE is already there, this line prints the specified message
else – this line initiates else to execute commands when the file pic.png doesn’t exist in the current directory
wget “$URL” – if pic.png doesn’t exist in the current directory, wget downloads it from the specified URL, in this case, “https://source1.com/images/$FILE”
fi – closes the if statement

Let’s run the Bash script:

$ bash download_file_if_missing.sh
pic.png is already present. Let's skip the download.

In this example, pic.png already exists locally.

2.4. Handling Files With Identical Names From Different Sources

There are situations in which we may download files with the same name from different URLs.

First, we can use the -O and flag to rename the files:

$ wget -nc -O source2_pic.png https://source2.com/images/pic.png

Here, we use the -O flag to define a unique filename depending on the URL. In this case, we download the file pic.png from the different URL as source2_pic.png. Additionally, we can use the -P flag to organize files by source directories:

$ wget -nc -P source2 https://source2.com/images/pic.png

Above, we download pic.png from the different URL and store it in the directory source2. If the source2 directory doesn’t exist, the command creates it automatically. Notably, we can apply -O and -P when working with both -nc and -N flags.

Next, let’s handle files with identical names from different sources when working with conditional logic in scripts. To achieve this, let’s modify the Bash script download_file_if_missing.sh:

#!/bin/bash

# Specify filename and new URL
FILE="pic.png"
URL="https://source2.com/images/$FILE"

# Capture domain name from new URL
DOMAIN=$(echo "$URL" | awk -F[/:] '{print $4}')

# Create a unique file name (e.g., source2.com_pic.png) based on the domain name
NEW_FILE_NAME="${DOMAIN}_${FILE}"

# Check whether $NEW_FILE_NAME already exists in the current directory
if [ -f "$NEW_FILE_NAME" ]; then
    echo "$NEW_FILE_NAME is already present. Let's skip the download."
else
    # Download the file and renames it to $NEW_FILE_NAME
    wget -O "$NEW_FILE_NAME" "$URL"
fi

Here, we change wget “$URL” to wget -O “$NEW_FILE_NAME” “$URL”. Now, the script downloads pic.png but renames it by including a unique identifier in the filename:

DOMAIN=$(echo “$URL” | awk -F[/:] ‘{print $4}’) – awk command extracts the domain from the URL and stores it in the variable DOMAIN
NEW_FILE_NAME=”${DOMAIN}_${FILE}” – adds the domain to the filename, for instance, pic.png changes to source2.com_pic.png and stores the result in the variable NEW_FILE_NAME

This modification helps to differentiate between the similar files pic.png from the two different URLs.

3. Conclusion

In this article, we explored various ways we can work with wget to avoid redundant downloads.

First, we explored the wget flags -nc to discard existing files and -N to check for timestamps. Next, we used conditional logic in shell scripts to check if a file exists before using wget. We then handled files with identical names from different sources.

We can now download files more efficiently by saving bandwidth and keeping our local storage more organized.

Administration

Scripting

Networking

Files

Processes

Full Archive

About Baeldung