1. Overview

GitHub allows us to fetch a repository in two ways:

  • Using git clone
  • Download as a zip or tar file

Although git clone is the most used method, it requires a Git installation on the machine. If Git isn’t available, we can download the repository in tar format and unpack the contents on the file system.

In this tutorial, we’ll look at some Linux commands to download a GitHub repository tarball and unpack it on the file system.

2. GitHub APIs for Tarball Downloading

GitHub offers REST APIs for managing repositories. Of course, it supports downloading a repository as a tarball.

To get a repository archive as tar, we can send an HTTP GET request to https://api.github.com/repos/OWNER/REPO/tarball/REF.

In this tutorial, we take the Baeldung Kotlin tutorial repo as an example: https://github.com/Baeldung/kotlin-tutorials. Next, let’s understand the path parameters:

  • OWNER – Repository owner (case-insensitive), it’s “baeldung” in our example.
  • REPO – Repository name (case-insensitive), in this case, it’s “kotlin-tutorials
  • REF – A git ref, can be tag, branch name, etc. If it’s not specified, the repository’s default branch will be taken.

If the GET request is successfully processed, the API returns a 302 HTTP status code with redirect headers, including the tarball location. We can download the tarball from the redirected location.

Currently, the API’s latest version is 2022-11-28. It’s worth noting that GitHub supported tarball downloading via https://github.com/OWNER/REPO/tarball/REF before the API was introduced. Until the time of writing, GitHub is still supporting this legacy approach.

Next, we’ll use common Linux commands to download the tarball of the Baeldung Kotlin tutorial repo, and we’ll discuss both the new and the legacy approaches.

3. Using the curl Command

We can access any HTTP URL by using the curl command. Let’s use curl with the -I option to check the response header details.

First, let’s look at the new API:

$ curl -I https://api.github.com/repos/Baeldung/kotlin-tutorials/tarball/master
HTTP/2 302
server: GitHub.com
date: Tue, 08 Aug 2023 11:09:07 GMT
content-type: text/html;charset=utf-8
content-length: 0
cache-control: public, must-revalidate, max-age=0
expires: Tue, 08 Aug 2023 11:09:07 GMT
location: https://codeload.github.com/Baeldung/kotlin-tutorials/legacy.tar.gz/refs/heads/master
x-github-api-version-selected: 2022-11-28
...

Next, let’s look at the legacy approach:

$ curl -I https://github.com/Baeldung/kotlin-tutorials/tarball/master
HTTP/2 302
server: GitHub.com
date: Tue, 08 Aug 2023 11:41:17 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, Accept-Encoding, Accept, X-Requested-With
location: https://codeload.github.com/Baeldung/kotlin-tutorials/legacy.tar.gz/refs/heads/master
cache-control: max-age=0, private
strict-transport-security: max-age=31536000; includeSubdomains; preload
...

As we can see, no matter whether we take the new API or the legacy URL, we received a response with the 302 status. Further, as the two outputs above show, they point to the same redirected location.

To follow the redirects, we need to use curl’s -L option:

$ curl -L https://api.github.com/repos/Baeldung/kotlin-tutorials/tarball/master -o new-api.tgz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 26.4M    0 26.4M    0     0  9105k      0 --:--:--  0:00:02 --:--:-- 11.6M
 
$ ls -l new-api.tgz
-rw-r--r--  1 kent  wheel    26M Aug  8 13:49 new-api.tgz

Similarly, we can download the tarball via the legacy URL:

$ curl -L https://github.com/Baeldung/kotlin-tutorials/tarball/master -o legacy-url.tgz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 26.4M    0 26.4M    0     0  8860k      0 --:--:--  0:00:03 --:--:-- 10.0M

$ ls -l legacy-url.tgz
-rw-r--r--  1 kent  wheel    26M Aug  8 13:47 legacy-url.tgz

As we can see, the above commands download the .tgz file to the same location where the curl command was executed. Later, we can unpack this file by using the tar command.

We can also unpack inline:

curl -L https://... | tar -xz

In most cases, curl can handshake the HTTPS connection with GitHub. However, if this connection fails, we can use the insecure option in curl:

curl -L -k https://... | tar -xz

4. Using wget Command

Apart from the curl command, which is a general-purpose command to execute HTTP requests, Linux also provides a wget command which is a dedicated non-interactive network downloader.

It supports HTTP and FTP protocols. Further, wget follows up to 20 redirects by default, so we don’t need extra options to download the tarball from GitHub:

$ wget https://api.github.com/repos/Baeldung/kotlin-tutorials/tarball/master -O wget-new-api.tgz
--2023-08-08 14:00:11--  https://api.github.com/repos/Baeldung/kotlin-tutorials/tarball/master
Resolving api.github.com (api.github.com)... 140.82.121.6
Connecting to api.github.com (api.github.com)|140.82.121.6|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/Baeldung/kotlin-tutorials/legacy.tar.gz/refs/heads/master [following]
--2023-08-08 14:00:11--  https://codeload.github.com/Baeldung/kotlin-tutorials/legacy.tar.gz/refs/heads/master
Resolving codeload.github.com (codeload.github.com)... 140.82.121.9
Connecting to codeload.github.com (codeload.github.com)|140.82.121.9|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘wget-new-api.tgz’

wget-new-api.tgz                               [             <=>                  ]  26,46M  10,1MB/s    in 2,6s

2023-08-08 14:00:14 (10,1 MB/s) - ‘wget-new-api.tgz’ saved [27741338]

$ ls -l wget-new-api.tgz
-rw-r--r--  1 kent  wheel    26M Aug  8 14:00 wget-new-api.tgz

As the output above shows, wget follows the redirect and successfully downloads the tar file.

Next, let’s give the legacy URL a try:

$ wget https://github.com/Baeldung/kotlin-tutorials/tarball/master -O wget-legacy-url.tgz
--2023-08-08 13:54:39--  https://github.com/Baeldung/kotlin-tutorials/tarball/master
Resolving github.com (github.com)... 140.82.121.3
Connecting to github.com (github.com)|140.82.121.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/Baeldung/kotlin-tutorials/legacy.tar.gz/refs/heads/master [following]
--2023-08-08 13:54:40--  https://codeload.github.com/Baeldung/kotlin-tutorials/legacy.tar.gz/refs/heads/master
Resolving codeload.github.com (codeload.github.com)... 140.82.121.9
Connecting to codeload.github.com (codeload.github.com)|140.82.121.9|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘wget-legacy-url.tgz’

wget-legacy-url.tgz                            [           <=>                     ]  26,46M  12,3MB/s    in 2,1s

2023-08-08 13:54:42 (12,3 MB/s) - ‘wget-legacy-url.tgz’ saved [27741338]

$ ls -l wget-legacy-url.tgz
-rw-r--r--  1 kent  wheel    26M Aug  8 13:54 wget-legacy-url.tgz

Likewise, the above commands download the .tgz file to the same location where we executed the command.

Similar to the curl command, we can pipe the downloaded tarball to the tar command so that we can unpack the archive file inline:

wget https://... -O - | tar -xz

The command  -O option redirects the archive content to the standard output and acts as an input to the tar command.

Again, similar to the curl command, we can skip the HTTPS certificate verification in wget using no-check-certificate:

wget --no-check-certificate https://... -O - | tar -xz

5. Downloading From Private Repositories

The commands we have discussed so far are useful for downloading archives from a public repo. However, in the case of a private repository, we need to provide GitHub access tokens.

To use the API to download a private repo as a tar file, we need to put the token in a Header attribute:

curl -L -H "Authorization: Bearer <THE-TOKEN>"
https://api.github.com/repos/USER/PRIVATE_REPO/tarball/master -o myRepo.tgz

Here, the token is an alphanumeric OAuth token which we need to add to the GitHub account.

6. Conclusion

In this article, we’ve studied two ways to download repository tarballs from GitHub. We used the curl and wget commands to download the archives by using interactive shell commands.

Furthermore, we also saw various command options to skip SSL verification and inline unpacking. Finally, we also used curl to download a private repository using OAuth tokens.

Comments are closed on this article!