wget is one of the most common tools used to download content from the Internet. Using wget, we can mirror websites, download files and media, etc., in a non-interactive, automated way. It is intelligent enough to automatically follow hyperlinks and HTTP redirects.
However, there might be situations where we do not want wget to follow redirects. In this tutorial, we’ll see the default behavior of wget to a server’s redirect response and how to change this behavior.
2. Using wget To Download Content
wget is quite simple to use. Let’s say we want to look at the license for the Linux kernel. We can fetch the LICENSE file from its git repo:
$ wget https://github.com/torvalds/linux/raw/master/LICENSES/preferred/GPL-2.0 --2022-02-03 20:39:21-- https://github.com/torvalds/linux/raw/master/LICENSES/preferred/GPL-2.0 SSL_INIT Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt' Resolving github.com (github.com)... 220.127.116.11 Connecting to github.com (github.com)|18.104.22.168|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://raw.githubusercontent.com/torvalds/linux/master/LICENSES/preferred/GPL-2.0 [following] --2022-02-03 20:39:22-- https://raw.githubusercontent.com/torvalds/linux/master/LICENSES/preferred/GPL-2.0 SSL_INIT Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 22.214.171.124, 126.96.36.199, 188.8.131.52, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|184.108.40.206|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 18729 (18K) [text/plain] Saving to: ‘GPL-2.0’ GPL-2.0 100%[==============================================================>] 18.29K --.-KB/s in 0.003s 2022-02-03 20:39:23 (5.63 MB/s) - ‘GPL-2.0’ saved [18729/18729]
This downloads the content into a file named GPL-2.0 in our current directory. Of course, we can change the name of the downloaded file using the -O flag, if we want to.
Here, if we look closely at the output above, there is a line that says “HTTP request sent, awaiting response… 302 Found”. This means that the URL we used is resulting in an HTTP redirection to another URL.
The redirection URL is provided in the Location HTTP response header. In the example above, we can see the value of the Location header as well in the next line in the output.
3. Preventing Redirects When Using wget
wget, by default, follows redirects from a given URL. But there might be situations where we want to reduce the number of redirects or prevent redirects completely. For example, when mirroring a website, we might want to prevent links in the website from redirecting to external websites.
We can do this using the –max-redirect flag of wget. Its default value is 20. Hence, wget will follow up to 20 redirections for a URL. If we set it to 0, it stops following redirects.
Let’s try the previous example again with –max-redirect set to 0:
$ wget --max-redirect=0 https://github.com/torvalds/linux/raw/master/LICENSES/preferred/GPL-2.0 --2022-02-03 20:49:28-- https://github.com/torvalds/linux/raw/master/LICENSES/preferred/GPL-2.0 SSL_INIT Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt' Resolving github.com (github.com)... 220.127.116.11 Connecting to github.com (github.com)|18.104.22.168|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://raw.githubusercontent.com/torvalds/linux/master/LICENSES/preferred/GPL-2.0 [following] 0 redirections exceeded.
Here, we can see that wget is reporting “0 redirections exceeded“. After that, it stops and doesn’t follow the URL in the Location header. Likewise, if we want to allow a specific number of redirections, we can set the value of –max-redirect to the desired value.
In this short article, we first saw that wget follows HTTP redirections by default, and then we looked at the flags we can use to prevent that.