GitHub allows us to fetch a repository in two ways:
- Using git clone
- Download as a zip or tar file
Although git clone is the most used method, it requires a Git installation on the machine. If Git isn’t available, we can download the repository in tar format and unpack the contents on the file system.
In this tutorial, we’ll look at some Linux commands to download a GitHub repository tarball and unpack it on the file system.
2. GitHub APIs for Tarball Downloading
To get a repository archive as tar, we can send an HTTP GET request to https://api.github.com/repos/OWNER/REPO/tarball/REF.
In this tutorial, we take the Baeldung Kotlin tutorial repo as an example: https://github.com/Baeldung/kotlin-tutorials. Next, let’s understand the path parameters:
- OWNER – Repository owner (case-insensitive), it’s “baeldung” in our example.
- REPO – Repository name (case-insensitive), in this case, it’s “kotlin-tutorials“
- REF – A git ref, can be tag, branch name, etc. If it’s not specified, the repository’s default branch will be taken.
If the GET request is successfully processed, the API returns a 302 HTTP status code with redirect headers, including the tarball location. We can download the tarball from the redirected location.
Currently, the API’s latest version is 2022-11-28. It’s worth noting that GitHub supported tarball downloading via https://github.com/OWNER/REPO/tarball/REF before the API was introduced. Until the time of writing, GitHub is still supporting this legacy approach.
Next, we’ll use common Linux commands to download the tarball of the Baeldung Kotlin tutorial repo, and we’ll discuss both the new and the legacy approaches.
3. Using the curl Command
We can access any HTTP URL by using the curl command. Let’s use curl with the -I option to check the response header details.
First, let’s look at the new API:
$ curl -I https://api.github.com/repos/Baeldung/kotlin-tutorials/tarball/master HTTP/2 302 server: GitHub.com date: Tue, 08 Aug 2023 11:09:07 GMT content-type: text/html;charset=utf-8 content-length: 0 cache-control: public, must-revalidate, max-age=0 expires: Tue, 08 Aug 2023 11:09:07 GMT location: https://codeload.github.com/Baeldung/kotlin-tutorials/legacy.tar.gz/refs/heads/master x-github-api-version-selected: 2022-11-28 ...
Next, let’s look at the legacy approach:
$ curl -I https://github.com/Baeldung/kotlin-tutorials/tarball/master HTTP/2 302 server: GitHub.com date: Tue, 08 Aug 2023 11:41:17 GMT content-type: text/html; charset=utf-8 vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, Accept-Encoding, Accept, X-Requested-With location: https://codeload.github.com/Baeldung/kotlin-tutorials/legacy.tar.gz/refs/heads/master cache-control: max-age=0, private strict-transport-security: max-age=31536000; includeSubdomains; preload ...
As we can see, no matter whether we take the new API or the legacy URL, we received a response with the 302 status. Further, as the two outputs above show, they point to the same redirected location.
To follow the redirects, we need to use curl’s -L option:
$ curl -L https://api.github.com/repos/Baeldung/kotlin-tutorials/tarball/master -o new-api.tgz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 26.4M 0 26.4M 0 0 9105k 0 --:--:-- 0:00:02 --:--:-- 11.6M $ ls -l new-api.tgz -rw-r--r-- 1 kent wheel 26M Aug 8 13:49 new-api.tgz
Similarly, we can download the tarball via the legacy URL:
$ curl -L https://github.com/Baeldung/kotlin-tutorials/tarball/master -o legacy-url.tgz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 26.4M 0 26.4M 0 0 8860k 0 --:--:-- 0:00:03 --:--:-- 10.0M $ ls -l legacy-url.tgz -rw-r--r-- 1 kent wheel 26M Aug 8 13:47 legacy-url.tgz
As we can see, the above commands download the .tgz file to the same location where the curl command was executed. Later, we can unpack this file by using the tar command.
We can also unpack inline:
curl -L https://... | tar -xz
In most cases, curl can handshake the HTTPS connection with GitHub. However, if this connection fails, we can use the insecure option in curl:
curl -L -k https://... | tar -xz
4. Using wget Command
Apart from the curl command, which is a general-purpose command to execute HTTP requests, Linux also provides a wget command which is a dedicated non-interactive network downloader.
It supports HTTP and FTP protocols. Further, wget follows up to 20 redirects by default, so we don’t need extra options to download the tarball from GitHub:
$ wget https://api.github.com/repos/Baeldung/kotlin-tutorials/tarball/master -O wget-new-api.tgz --2023-08-08 14:00:11-- https://api.github.com/repos/Baeldung/kotlin-tutorials/tarball/master Resolving api.github.com (api.github.com)... 126.96.36.199 Connecting to api.github.com (api.github.com)|188.8.131.52|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://codeload.github.com/Baeldung/kotlin-tutorials/legacy.tar.gz/refs/heads/master [following] --2023-08-08 14:00:11-- https://codeload.github.com/Baeldung/kotlin-tutorials/legacy.tar.gz/refs/heads/master Resolving codeload.github.com (codeload.github.com)... 184.108.40.206 Connecting to codeload.github.com (codeload.github.com)|220.127.116.11|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [application/x-gzip] Saving to: ‘wget-new-api.tgz’ wget-new-api.tgz [ <=> ] 26,46M 10,1MB/s in 2,6s 2023-08-08 14:00:14 (10,1 MB/s) - ‘wget-new-api.tgz’ saved  $ ls -l wget-new-api.tgz -rw-r--r-- 1 kent wheel 26M Aug 8 14:00 wget-new-api.tgz
As the output above shows, wget follows the redirect and successfully downloads the tar file.
Next, let’s give the legacy URL a try:
$ wget https://github.com/Baeldung/kotlin-tutorials/tarball/master -O wget-legacy-url.tgz --2023-08-08 13:54:39-- https://github.com/Baeldung/kotlin-tutorials/tarball/master Resolving github.com (github.com)... 18.104.22.168 Connecting to github.com (github.com)|22.214.171.124|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://codeload.github.com/Baeldung/kotlin-tutorials/legacy.tar.gz/refs/heads/master [following] --2023-08-08 13:54:40-- https://codeload.github.com/Baeldung/kotlin-tutorials/legacy.tar.gz/refs/heads/master Resolving codeload.github.com (codeload.github.com)... 126.96.36.199 Connecting to codeload.github.com (codeload.github.com)|188.8.131.52|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [application/x-gzip] Saving to: ‘wget-legacy-url.tgz’ wget-legacy-url.tgz [ <=> ] 26,46M 12,3MB/s in 2,1s 2023-08-08 13:54:42 (12,3 MB/s) - ‘wget-legacy-url.tgz’ saved  $ ls -l wget-legacy-url.tgz -rw-r--r-- 1 kent wheel 26M Aug 8 13:54 wget-legacy-url.tgz
Likewise, the above commands download the .tgz file to the same location where we executed the command.
Similar to the curl command, we can pipe the downloaded tarball to the tar command so that we can unpack the archive file inline:
wget https://... -O - | tar -xz
The command -O option redirects the archive content to the standard output and acts as an input to the tar command.
Again, similar to the curl command, we can skip the HTTPS certificate verification in wget using —no-check-certificate:
wget --no-check-certificate https://... -O - | tar -xz
5. Downloading From Private Repositories
The commands we have discussed so far are useful for downloading archives from a public repo. However, in the case of a private repository, we need to provide GitHub access tokens.
To use the API to download a private repo as a tar file, we need to put the token in a Header attribute:
curl -L -H "Authorization: Bearer <THE-TOKEN>" https://api.github.com/repos/USER/PRIVATE_REPO/tarball/master -o myRepo.tgz
Here, the token is an alphanumeric OAuth token which we need to add to the GitHub account.
In this article, we’ve studied two ways to download repository tarballs from GitHub. We used the curl and wget commands to download the archives by using interactive shell commands.
Furthermore, we also saw various command options to skip SSL verification and inline unpacking. Finally, we also used curl to download a private repository using OAuth tokens.