Authors Top

If you have a few years of experience in the Linux ecosystem, and you’re interested in sharing that experience with the community, have a look at our Contribution Guidelines.

1. Overview

GNU Wget is a de facto standard program used to download data from web servers. We’ll take a hands-on approach in this tutorial to understand a few ways in which we can output the document and headers to the stdout using the wget command.

2. Default Output Behavior

To understand the default output behavior of the wget command, let’s use it to download data from google.com:

$ wget http://www.google.com
--2022-04-02 19:27:07--  http://www.google.com/
Resolving www.google.com (www.google.com)... 172.217.174.228, 2404:6800:4009:81d::2004
Connecting to www.google.com (www.google.com)|172.217.174.228|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'index.html'

index.html                                                            [ <=>                                                                                                                                                                ]  15.93K  --.-KB/s    in 0.03s

2022-04-02 19:27:08 (597 KB/s) - 'index.html' saved [16316]

We can notice a lot of diagnostic information in the output, and the actual document content isn’t shown on the stdout. On the other hand, wget saved the document in a file named index.html. Further, it may seem at first that diagnostic information is sent to the stdout. However, in reality, it is sent to the stderr stream.

We can verify that wget sends the diagnostic information to stderr by redirecting the stderr to a different file and verifying its content:

$ wget http://www.google.com 2>stderr.dump
$ cat stderr.dump
--2022-04-02 19:33:57--  http://www.google.com/
Resolving www.google.com (www.google.com)... 142.250.67.196, 2404:6800:4009:81f::2004
Connecting to www.google.com (www.google.com)|142.250.67.196|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'index.html'

     0K .......... .....                                        533K=0.03s

2022-04-02 19:33:57 (533 KB/s) - 'index.html' saved [16289]

Let’s keep this behavior in mind because wget interprets the response headers as part of diagnostic information. So, it’ll be sending the header information to stderr. Additionally, we’re not interested in the default diagnostic output of wget, so we’ll use the –quiet (-q) option to suppress this noise.

3. wget with –output-document

The wget command outputs the document content in a separate file by default. However, we can use the –output-document (-O) option to redirect the content to a file of our choice. As a particular use case, if we use as the file, it directs the content to stdout.

Let’s see this in action by first redirecting the output to the content_from_google file:

$ wget -q --output-document content_from_google www.google.com
$ ls -l content_from_google
-rw-r--r--  1 tavasthi  192360288  16323 Apr  3 01:16 content_from_google

Next, let’s send the output to stdout:

$ wget -q --output-document - www.google.com
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en-IN"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title>

Great! We’ve learned how to output the document to stdout.

In the following two sections, we’ll focus on how to output the headers to stdout.

4. wget with –save-headers

By using the –save-headers option, we can ask wget to add headers before the actual document content while separating the two by inserting an empty line after the headers. In this scenario, wget redirects headers and document content to the same target file.

Let’s see this in action:

$ wget -q --save-headers --output-document - www.google.com
HTTP/1.1 200 OK
Date: Sat, 02 Apr 2022 20:01:40 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
Server: gws
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Set-Cookie: 1P_JAR=2022-04-02-20; expires=Mon, 02-May-2022 20:01:40 GMT; path=/; domain=.google.com; Secure
Set-Cookie: AEC=AVQQ_LAeuGIhWDkqKiZiuP8N3P1Jz1x5Jkzoi0ckbpZotvhLRMeBQbD0F0I; expires=Thu, 29-Sep-2022 20:01:40 GMT; path=/; domain=.google.com; Secure; HttpOnly; SameSite=lax
Set-Cookie: NID=511=fC52DE0Nqpm0zfbhAiW4qm6kdo7gy3dibVDFc6jos0QM32GcCFox_3VNLcgvSCaAeGHMp4LkqqvNda_nzO36w-NsjI4_ArdvfUnGuKIY6pgsTFPjIIb4L80X0m9ZU1a-zhSmObqwbEytIHaxMaP61L0qhJVRCgNpkCkfBubsEjQ; expires=Sun, 02-Oct-2022 20:01:40 GMT; path=/; domain=.google.com; HttpOnly
Accept-Ranges: none
Vary: Accept-Encoding
Transfer-Encoding: chunked

<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en-IN"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title>

Additionally, we can verify that wget didn’t send header information to stderr, by simply redirecting both streams to separate files and verifying their content:

$ wget -q --save-headers --output-document header_with_content www.google.com 2>stderr.out 1>stdout.out
bash-3.2$ test -s stderr.out
bash-3.2$ echo $?
1

By using the -s option with the test command, we can confidently say that the stderr.out file is empty with a file size of 0.

5. wget with –server-response

wget provides the –server-response option, which we can use to get the response headers. However, unlike the –save-headers option, the –server-response treats the header response as a piece of diagnostic information and prefers to send it to the stderr stream.

Let’s use the –server-response option and redirect the contents of stderr and stdout streams to stderr.out and stdout.out files, respectively:

$ wget -q --server-response --output-document header_with_content www.google.com 2>stderr.out 1>stdout.out
$ cat stderr.out
  HTTP/1.1 200 OK
  Date: Sat, 02 Apr 2022 20:10:04 GMT
  Expires: -1
  Cache-Control: private, max-age=0
  Content-Type: text/html; charset=ISO-8859-1
  P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
  Server: gws
  X-XSS-Protection: 0
  X-Frame-Options: SAMEORIGIN
  Set-Cookie: 1P_JAR=2022-04-02-20; expires=Mon, 02-May-2022 20:10:04 GMT; path=/; domain=.google.com; Secure
  Set-Cookie: AEC=AVQQ_LCUA9Yq67FEAgNtMJs9LdKfaRbLx_iMk99w5qmdIaHRhFPYkxzPCw; expires=Thu, 29-Sep-2022 20:10:04 GMT; path=/; domain=.google.com; Secure; HttpOnly; SameSite=lax
  Set-Cookie: NID=511=CeFWgDo2-TR1PqQSpvyZkMCfZdtLSEIELe9T6KhKT2LaMR_QD8gNU2IkhQppOPHPPccQK8emgfYCyBUZZHfZpKbNPqJ8NgCCizFAI-oOSuh5B3ISULBVxUuaIjL5MZ6wp0EGKc-qv_hVvmgmhlRe7rjRdjgwXs3Svp2ubTnWNFg; expires=Sun, 02-Oct-2022 20:10:04 GMT; path=/; domain=.google.com; HttpOnly
  Accept-Ranges: none
  Vary: Accept-Encoding
  Transfer-Encoding: chunked

Since the headers are being sent to the stderr stream, we’d need an additional redirection from stderr to stdout if we want to output the headers on stdout. So, let’s use the 2>&1 redirection to redirect stderr to stdout:

$ wget -q --server-response --output-document - www.google.com 2>&1
  HTTP/1.1 200 OK
  Date: Sat, 02 Apr 2022 20:23:01 GMT
  Expires: -1
  Cache-Control: private, max-age=0
  Content-Type: text/html; charset=ISO-8859-1
  P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
  Server: gws
  X-XSS-Protection: 0
  X-Frame-Options: SAMEORIGIN
  Set-Cookie: 1P_JAR=2022-04-02-20; expires=Mon, 02-May-2022 20:23:01 GMT; path=/; domain=.google.com; Secure
  Set-Cookie: AEC=AVQQ_LDoJE9yujyrwmXjPVoxYtDVSpHrcPvcTsAjCEgSRD_1iU0PUCHTW4E; expires=Thu, 29-Sep-2022 20:23:01 GMT; path=/; domain=.google.com; Secure; HttpOnly; SameSite=lax
  Set-Cookie: NID=511=Vj-mKsn9Lba3zf7DaDQ4mqhY4jJkHtStKG07jt98OQzod_sexTrqs6A7i_H7L6VJjJ3Ev_5JWkpFMvIoUiHdNtu9rE18C5vxEypdxp6mYwiMkOqI4Z2m_28RbFYgzhpNn4OmXh44xom-TxKKMkszAUMtP5FaI637gJ7XrHvhx_s; expires=Sun, 02-Oct-2022 20:23:01 GMT; path=/; domain=.google.com; HttpOnly
  Accept-Ranges: none
  Vary: Accept-Encoding
  Transfer-Encoding: chunked
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en-IN"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title>

We must notice that, unlike the –save-headers option, there is no empty line separation between the headers and the actual content.

Let’s verify that the above command indeed sends both the output and headers to stdout by redirecting stderr and stdout streams to separate files and checking their sizes:

$ (wget -q --server-response --output-document - www.google.com 2>&1) 1>stdout.out 2>stderr.out
$ test -s stderr.out
$ echo $?
1

We can see that the size of stderr.out file is zero. So, we can confidently say that our approach works as expected.

6. Conclusion

In this tutorial, we developed an understanding of the default output behavior of the wget command. Additionally, we explored a few options available with the wget command to output the document and headers to stdout.

Authors Bottom

If you have a few years of experience in the Linux ecosystem, and you’re interested in sharing that experience with the community, have a look at our Contribution Guidelines.

Comments are closed on this article!