Generic Top

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

>> CHECK OUT THE COURSE

1.  Overview

We may wish to send HTTP requests without using a web browser or other interactive app. For this, Linux provides us with two commands: curl and wget.

Both commands are quite helpful as they provide a mechanism for non-interactive download and upload of data. We can use them for web crawling, automating scripts, testing of APIs, etc.

In this tutorial, we will be looking at the differences between these two utilities.

2. Protocols

2.1. Using the HTTP Protocol

Both curl and wget support HTTP, HTTPS, and FTP protocols. So if we want to get a page from a website, say baeldung.com, then we can run them with the web address as the parameter:

wget https://www.baeldung.com/

--2019-10-02 22:00:34--  https://www.baeldung.com/
Resolving www.baeldung.com (www.baeldung.com)... 2606:4700:30::6812:3e4e, 2606:4700:30::6812:3f4e, 104.18.63.78, ...
Connecting to www.baeldung.com (www.baeldung.com)|2606:4700:30::6812:3e4e|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html’

index.html                    [ <=> ] 122.29K  --.-KB/s    in 0.08s   

2019-10-02 22:00:35 (1.47 MB/s) - ‘index.html’ saved [125223]

The main difference between them is that curl will show the output in the console. On the other hand, wget will download it into a file.

We can save the data in a file with curl by using the -o parameter:

curl https://www.baeldung.com/ -o baeldung.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  122k    0  122k    0     0    99k      0 --:--:--  0:00:01 --:--:--   99k

2.2. Download and Upload using FTP

We can also use curl and wget to download files using the FTP protocol:

wget --user=abhi --password='myPassword' ftp://abc.com/hello.pdf
curl -u abhi:myPassword 'ftp://abc.com/hello.pdf' -o hello.pdf

We can also upload files to an FTP server with curl. For this, we can use the -T parameter:

curl -T "img.png" ftp://ftp.example.com/upload/

We should note that when uploading to a directory, we must use provide the trailing /, otherwise curl will think that the path represents a file.

2.3. Differences

The difference between the two is that curl supports a plethora of other protocols. This includes DICT, FILE, FTPS, GOPHER, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMB, SMBS, SMTP, SMTPS, TELNET, and TFTP.

We can treat curl as a general-purpose tool for transferring data to or from a server.

On the other hand, wget is basically a network downloader.

3. Recursive Download

When we wish to make a local copy of a website, wget is the tool to use. curl does not provide recursive download, as it cannot be provided for all its supported protocols.

We can download a website with wget in a single command:

wget --recursive https://www.baeldung.com

This will download the homepage and any resources linked from it. As we can see, www.baeldung.com links to various other resources like:

  • Start here
  • REST with Spring course
  • Learn Spring Security course
  • Learn Spring course

wget will follow each of these resources and download them individually:

--2019-10-02 22:09:17--  https://www.baeldung.com/start-here
...
Saving to: ‘www.baeldung.com/start-here’

www.baeldung.com/start-here               [  <=> ] 134.85K   321KB/s    in 0.4s    

2019-10-02 22:09:18 (321 KB/s) - ‘www.baeldung.com/start-here’ saved [138087]

--2019-10-02 22:09:18--  https://www.baeldung.com/rest-with-spring-course
...
Saving to: ‘www.baeldung.com/rest-with-spring-course’

www.baeldung.com/rest-with-spring-cou     [ <=> ] 244.77K   395KB/s    in 0.6s    

2019-10-02 22:09:19 (395 KB/s) - ‘www.baeldung.com/rest-with-spring-course’ saved [250646]
... more output omitted

3.1. Recursive Download with HTTP

The recursive download is one of the most powerful features of wget. This means that wget can follow links in HTML, XHTML, and CSS pages, to create local versions of remote web sites, fully recreating the directory structure of the original site.

Recursive downloading in wget is breadth-first. In other words, it first downloads the requested document, then the documents linked from that document, then the documents linked by those documents, and so on. The default maximum depth is set to five, but it can be overridden using the -l parameter:

wget ‐l=1 ‐‐recursive ‐‐no-parent http://example.com

In the case of HTTP or HTTPS URLs, wget scans and parses the HTML or CSS. Then, it retrieves the files the document refers to, through markups like href or src.

By default, wget will exclude paths under robots.txt (Robot Exclusion Standard). To switch this off, we can use the -e parameter:

wget -e robots=off http://example.com

3.2. Recursive Download with FTP

Unlike HTTP recursion, FTP recursion is performed depth-first. This means that wget will retrieve data of the first directory up to the specified depth level, and then move to the next directory in the directory tree.

4. Conclusion

In this article, we saw how both curl and wget can download files from internet servers.

wget is a simpler solution and only supports a small number of protocols. It is very good for downloading files and can download directory structures recursively.

We also saw how curl supports a much larger range of protocols, making it a more general-purpose tool.

Generic bottom

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

>> CHECK OUT THE COURSE

Leave a Reply

avatar
  Subscribe  
Notify of