Baeldung Pro – Linux – NPI EA (cat = Baeldung on Linux)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

Partner – Orkes – NPI EA (tag=Kubernetes)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

1. Introduction

Downloading files from the web is a common task, and accordingly, wget command presents as one of the most powerful command-line tools for the job.

If we want to download a single file, as well as a whole website, or even more URLs in one command, wget certainly comes up with flexible options to automate such jobs. Being so simple but effective, wget is massively used in scripting, automation, and system administration.

In this tutorial, we’ll explore different techniques for downloading URLs using wget. We’ll look at basic usages, efficient handling of downloads, and various options that optimize the process for different needs.

2. Basic Usage of wget for Downloading URLs

The wget command is a simple yet powerful tool used to download files from the internet. It’s highly versatile, supporting multiple protocols like HTTP, HTTPS, and FTP. Whether fetching one file or automating large downloads, wget gets the job done efficiently. To begin, let’s start by installing wget in our Linux subsystem.

2.1. Installing wget on Linux

Most Linux distributions come with wget preinstalled. However, if it’s missing, we can install it using:

$ sudo apt update && sudo apt install wget -y

2.2. Downloading a File with wget

Afterwards, we can start downloading a file using wget:

$ wget https://file-examples.com/wp-content/storage/2017/10/file-example_PDF_1MB.pdf
--2025-01-28 22:20:35--  https://file-examples.com/wp-content/storage/2017/10/file-example_PDF_1MB.pdf
Resolving file-examples.com (file-examples.com)... 185.135.88.81
Connecting to file-examples.com (file-examples.com)|185.135.88.81|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘file-example_PDF_1MB.pdf’

file-example_PDF_1MB.pdf                                       [ <=>                                                                                                                                    ]   1.19K  --.-KB/s    in 0s      

2025-01-28 22:20:36 (41.1 MB/s) - ‘file-example_PDF_1MB.pdf’ saved [1223]

Here, wget downloads file-example_PDF_1MB.pdf from https://file-examples.com/wp-content/storage/2017/10/ and saves it in our current directory. By default, the file retains its original name. If the download is interrupted, we can resume it later using the -c option. Thus, this makes wget a reliable tool for handling large files and unstable connections.

3. Downloading Multiple URLs With wget

For a large number of files, running wget for each URL would be inefficient. Thankfully, wget allows us to download multiple files with one command. Consequently, that saves us time and prevents us from having to open and run wget for each URL. We can specify multiple URLs directly, use a file to store links, or even pass them inline.

3.1. Downloading Multiple URLs in a Single Command

The easiest way to fetch multiple files is by passing all URLs together:

$ wget https://file-examples.com/wp-content/storage/2017/10/file-example_PDF_1MB.pdf https://file-examples.com/wp-content/storage/2017/10/file-sample_150kB.pdf
--2025-01-28 22:22:44--  https://file-examples.com/wp-content/storage/2017/10/file-example_PDF_1MB.pdf
Resolving file-examples.com (file-examples.com)... 185.135.88.81
Connecting to file-examples.com (file-examples.com)|185.135.88.81|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘file-example_PDF_1MB.pdf’

file-example_PDF_1MB.pdf                                       [ <=>                                                                                                                                    ]   1.19K  --.-KB/s    in 0s      

2025-01-28 22:22:45 (27.6 MB/s) - ‘file-example_PDF_1MB.pdf’ saved [1223]

--2025-01-28 22:22:45--  https://file-examples.com/wp-content/storage/2017/10/file-sample_150kB.pdf
Reusing existing connection to file-examples.com:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘file-sample_150kB.pdf’

file-sample_150kB.pdf                                          [ <=>                                                                                                                                    ]   1.19K  --.-KB/s    in 0s      

2025-01-28 22:22:45 (16.1 MB/s) - ‘file-sample_150kB.pdf’ saved [1223]

FINISHED --2025-01-28 22:22:45--
Total wall clock time: 0.6s
Downloaded: 2 files, 2.4K in 0s (20.3 MB/s)

Here, wget downloads both file-example_PDF_1MB.pdf and file-sample_150kB.pdf to our current directory. Although this method is quick for a few files, it can become cumbersome if we have a long list of URLs. Therefore, let’s explore other ways of downloading multiple URLs together.

3.2. Downloading URLs From a File

A more efficient way to handle multiple URLs is by storing them in a text file and passing it to wget:

$ echo -e "https://file-examples.com/wp-content/storage/2017/10/file-example_PDF_1MB.pdf\nhttps://file-examples.com/wp-content/storage/2017/10/file-sample_150kB.pdf" > download.txt

Here, we first create a new file, download.txt, containing all the URLs that we want to download. Now, instead of typing each URL manually, we can instruct wget to read from this file and download all the listed files:

$ wget -i download.txt

This approach is ideal for handling large amounts of files as it minimizes manual input and, at the same time, makes the downloads well-structured and easy to manage.

3.3. Downloading Inline URLs Without a File

Alternatively, if we don’t want to create a separate text file to store the URLs, we can directly pass multiple URLs via an inline approach:

$ wget -i - <<< "https://file-examples.com/wp-content/storage/2017/10/file-example_PDF_1MB.pdf
https://file-examples.com/wp-content/storage/2017/10/file-sample_150kB.pdf"

Here, the -i – option tells wget to read from standard input, and we use a heredoc (<<<) to provide the URLs directly. This method is useful for quick downloads without cluttering our system with extra files.

4. Advanced wget Techniques for Handling Multiple Downloads

When downloading multiple files, we need greater control to achieve better speed, avoid duplicates, and maintain directory structures. To that end, wget offers several advanced options to help us manage bulk downloads efficiently. Let’s explore these approaches one by one.

4.1. Using a Loop to Download Multiple URLs

Instead of specifying multiple URLs manually, we can use a loop to process them dynamically. This is particularly useful when URLs follow a predictable pattern or are generated programmatically.

For example, suppose we need to download 10 sequentially numbered images from a website:

$ for i in {1..10}; do wget https://example.com/image$i.jpg; done

This loop iterates from 1 to 10, replacing $i with each number in the sequence. As a result, wget downloads image1.jpg, image2.jpg, and so on, up to image10.jpg.

4.2. Avoiding Duplicate Downloads

Sometimes, downloading files with the same name can create conflicts. To avoid this, we can use the -N option to prevent re-downloading unchanged files:

$ wget -N -i download.txt

The -N flag ensures that wget only downloads a file if it’s newer than the existing one. This is helpful for updating files without unnecessary re-downloads.

4.3. Preserving Directory Structure

By default, wget saves all files in the current directory. If we want to keep the original directory structure, we can use:

$ wget -x -nH -i url-list.txt
......
Downloaded: 2 files, 2.4K in 0s (27.8 MB/s)
$ ls
download.txt wp-content

The -x option enables directory recreation, while -nH removes the hostname from the path. This keeps files organized and prevents name conflicts. We can see that the new files are downloaded to the wp-content directory.

5. Additional wget Options for Customizing Downloads

wget has several options to tune it and make it even more efficient, flexible, or whatever a particular case demands. We can limit download speeds, retry downloads that failed, or change user-agent settings.

5.1. Limiting Download Speed

To prevent wget from consuming too much bandwidth, we can set a download speed limit:

$ wget --limit-rate=500k http://example.com/largefile.zip

The –limit-rate option restricts the download speed to 500 KB/s, ensuring that other network activities aren’t disrupted.

5.2. Retrying Failed Downloads

If a download fails due to network issues, we can enable automatic retries:

$ wget --tries=5 http://example.com/file.zip

The –tries=5 option makes wget retry up to five times before giving up. This is useful for unstable connections.

5.3. Resuming Large Downloads

If we’re downloading large files and the connection gets interrupted, it’s irritating to start from scratch. Fortunately, we can tell wget to resume incomplete downloads with the -c option:

$ wget -c https://example.com/largefile.zip

This is particularly useful for slow connections or unreliable networks, ensuring that we don’t waste time re-downloading already completed portions of a file.

5.4. Changing the User-Agent

Some servers block automated downloads based on the user-agent. We can bypass this by setting a custom user-agent:

$ wget --user-agent="Mozilla/5.0" http://example.com/protectedfile.zip

Here, wget mimics a web browser by using a Mozilla user-agent, allowing access to servers that restrict automated tools.

6. Conclusion

In this article, we explored wget, a strong and flexible utility that can download files from the web. Whether we have to fetch a single file or several, it has a simple yet effective solution for us.

We can pass multiple URLs directly, store the links in a file, or provide them inline. We can fine-tune our downloads for various usage by adjusting options like -N to stop duplicate downloads and –x -nH to save directory structures.

By mastering wget, we can automate the most repetitive tasks of downloading and fetching a huge amount of data with ease. And so, its versatility makes it a go-to tool for system administrators, developers, and anyone who works with online resources.

Exploring additional options and integrating wget into scripts can further enhance our workflow, making downloads more efficient and manageable.