Baeldung Pro – Linux – NPI EA (cat = Baeldung on Linux)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

Partner – Orkes – NPI EA (tag=Kubernetes)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

1. Overview

The wget command-line tool enables us to download files from the internet. By default, wget downloads a file even if it already exists locally. However, to save time, bandwidth, and storage, we can avoid downloading the same file repeatedly. Fortunately, wget supports checking for a file locally before downloading it again.

In this tutorial, we’ll explore how to ensure that wget only downloads a file that doesn’t exist.

2. Skip Downloading a File if It Already Exists

The wget command follows a simple syntax:

$ wget [URL]

For instance, let’s use it to download an image:

$ wget https://source1.com/images/pic.png
...
Length: 224566 (219K) [image/png]
Saving to: ‘pic.png’

pic.png 100%[=================================================================>] 219.30K   556KB/s    in 0.4s    

2024-11-03 02:43:32 (556 KB/s) - ‘pic.png’ saved [224566/224566]

This command downloads pic.png into the current directory. If pic.png already exists, wget detects the existing file and creates a new copy with a numbered suffix, for instance, pic.png.1, pic.png.2, and so on. This can be inefficient, since downloading the same file repeatedly wastes bandwidth and storage space.

Now, wget contains flags that we can use to modify its default behavior. Therefore, let’s discuss flags we can utilize to ensure that wget skips downloading the file if it already exists.

2.1. Using the -nc Flag

The -nc or –no-clobber flag in wget prevents overwriting an existing file with the same name:

$ wget -nc https://source1.com/images/pic.png
File ‘pic.png’ already there; not retrieving.

Above, nc stops wget from downloading pic.png if a file with the same name already exists in the current directory. However, it only checks the file’s name, not whether its content has changed. Thus, wget -nc won’t download pic.png, even if there’s a newer version on the server. As a result, it hinders us from always downloading the latest version of the file.

This makes -nc useful if we never want to overwrite the file.

2.2. Using the -N Flag

The -N or –timestamping flag helps us download the latest version of the file. To clarify, the -N flag enables us to download the file if it doesn’t exist in the current directory or the server version of the file is newer than the local version:

$ wget -N https://source1.com/images/pic.png

If the file exists locally, -N instructs wget to compare the timestamp for the file on the server against the timestamp for the file in the current working directory:

$ wget -N https://source1.com/images/pic.png
...
File ‘pic.png’ not modified on server. Omitting download.

Above, wget skips downloading pic.png because the server version is not newer.

Therefore, we can use the -N flag to work with files that are updated over time.

2.3. Using Conditional Logic in Scripts

Besides flags, we can use conditional logic in shell scripts to check if the file exists before using wget. This gives the administrator more control over whether the file should be downloaded.

To demonstrate, let’s create the Bash script download_file_if_missing.sh and paste:

#!/bin/bash

FILE="pic.png"
URL="https://source1.com/images/$FILE"

if [ -f "$FILE" ]; then
    echo "$FILE is already present. Let's skip the download."
else
    wget "$URL"
fi

This Bash script checks for the file locally before downloading it:

  • if [ -f “$FILE” ]; then – this line starts a conditional if statement; the -f flag checks whether the file is a regular file; if FILE, in this case, pic.png exists in the current directory, then the condition evaluates to true
  • echo “$FILE is already present. Let’s skip the download.” – if $FILE is already there, this line prints the specified message
  • else – this line initiates else to execute commands when the file pic.png doesn’t exist in the current directory
  • wget “$URL” – if pic.png doesn’t exist in the current directory, wget downloads it from the specified URL, in this case, “https://source1.com/images/$FILE”
  • fi – closes the if statement

Let’s run the Bash script:

$ bash download_file_if_missing.sh
pic.png is already present. Let's skip the download.

In this example, pic.png already exists locally.

2.4. Handling Files With Identical Names From Different Sources

There are situations in which we may download files with the same name from different URLs.

First, we can use the -O and flag to rename the files:

$ wget -nc -O source2_pic.png https://source2.com/images/pic.png

Here, we use the -O flag to define a unique filename depending on the URL. In this case, we download the file pic.png from the different URL as source2_pic.png. Additionally, we can use the -P flag to organize files by source directories:

$ wget -nc -P source2 https://source2.com/images/pic.png

Above, we download pic.png from the different URL and store it in the directory source2. If the source2 directory doesn’t exist, the command creates it automatically. Notably, we can apply -O and -P when working with both -nc and -N flags.

Next, let’s handle files with identical names from different sources when working with conditional logic in scripts. To achieve this, let’s modify the Bash script download_file_if_missing.sh:

#!/bin/bash

# Specify filename and new URL
FILE="pic.png"
URL="https://source2.com/images/$FILE"

# Capture domain name from new URL
DOMAIN=$(echo "$URL" | awk -F[/:] '{print $4}')

# Create a unique file name (e.g., source2.com_pic.png) based on the domain name
NEW_FILE_NAME="${DOMAIN}_${FILE}"

# Check whether $NEW_FILE_NAME already exists in the current directory
if [ -f "$NEW_FILE_NAME" ]; then
    echo "$NEW_FILE_NAME is already present. Let's skip the download."
else
    # Download the file and renames it to $NEW_FILE_NAME
    wget -O "$NEW_FILE_NAME" "$URL"
fi

Here, we change wget “$URL” to wget -O “$NEW_FILE_NAME” “$URL”. Now, the script downloads pic.png but renames it by including a unique identifier in the filename:

  • DOMAIN=$(echo “$URL” | awk -F[/:] ‘{print $4}’)awk command extracts the domain from the URL and stores it in the variable DOMAIN
  • NEW_FILE_NAME=”${DOMAIN}_${FILE}” – adds the domain to the filename, for instance, pic.png changes to source2.com_pic.png and stores the result in the variable NEW_FILE_NAME

This modification helps to differentiate between the similar files pic.png from the two different URLs.

3. Conclusion

In this article, we explored various ways we can work with wget to avoid redundant downloads.

First, we explored the wget flags -nc to discard existing files and -N to check for timestamps. Next, we used conditional logic in shell scripts to check if a file exists before using wget. We then handled files with identical names from different sources.

We can now download files more efficiently by saving bandwidth and keeping our local storage more organized.