Unzip Files Bigger Than 4GB | Baeldung on Linux

1. Introduction

Archiving and compression are concepts that appear on many levels of the memory hierarchy and areas of the operating system (OS). Yet, despite their widespread use, issues can still occur when dealing with compressed archives.

In this tutorial, we explore simple constraints of the FAT32 filesystem and how they affect our ability to compress and archive data. First, we briefly go over some limitations of FAT32. Next, we generate sample data in the form of a large compressed archive containing a single file over 4GB in size. After that, we test the extraction of that file on a FAT32 system and discuss ways to circumvent issues around the process. Finally, we check how the sample data file extracts on a typical Linux filesystem.

We tested the code in this tutorial on Debian 11 (Bullseye) with GNU Bash 5.1.4. It should work in most POSIX-compliant environments.

2. FAT32 Filesystem Constraints

As a very old 1997 way to organize storage, FAT32 is still used, mainly for compatibility reasons. For example, external drives commonly use this filesystem to be recognized on many devices.

Yet, the age of FAT32 is evident when it comes to its limits:

maximum size for a single file is 4GB or 4294967296 bytes
maximum file count in a directory is 268173300

For comparison, even the older ext2 from 1993 can theoretically handle files as big as 16GB or 17179869184 bytes and as many as 10^18. Of course, these values are based on the metadata structure but still place FAT32 way behind.

Because of this, using any ext* variant as our main Linux filesystem and introducing a FAT32 external drive or partition is common but can lead to issues.

Importantly, we can use a single command to determine our current directory’s filesystem type:

$ stat --file-system --format=%T .

For FAT32 systems, this command can return msdos and vfat, depending on our setup.

3. Sample Data

While they can be synonymous in many ways, compression usually builds on top of archiving by adding algorithms that reduce the size of the archived data.

Let’s create some sample data. First, we create an initial large 5GB file via truncate:

$ mkdir sampledata
$ cd sampledata
$ truncate --size=5G sample
$ wc --bytes < sample
5368709120

After generating this file as sample in the sampledata directory, we use wc to see its total size.

Next, we can store our large file inside sample.zip via zip with minimal compression (-1):

$ zip -1 sample.zip sample
  adding: sample (deflated 100%)
$ wc --bytes < sample.zip
23418929

Notably, the size of the resulting archive is much lower than its constituents at just 23MB, despite the fact that it contains the 5GB sample file.

4. Extract File Over 4GB on FAT32

After generating sample.zip, we can try to decompress it on FAT32.

4.1. unzip

First, let’s use unzip to extract the 5GB file from our archive:

$ stat --file-system --format=%T .
msdos
$ unzip sample.zip
Archive:  sample.zip
  inflating: sample
sample:  write error (disk full?).  Continue? (y/n/^C)

warning:  sample is probably truncated
$ df --human-readable .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       66.6G 4.0G 62.6G   6% /mnt/fat32

Due to the restrictions of FAT32, we encounter a false full disk issue despite our partition still having ample free space, as made obvious by the output of df. Of course, any other files would still be extractable without issues as long as their size and count are below the limits we discussed.

In this case, we won’t be able to extract the 5GB file in its entirety on our current filesystem. Still, there is a way to get the data.

4.2. List Archive Contents

Let’s use unzip to see what’s inside the archive:

$ unzip -l sample.zip
Archive:  sample.zip
  Length      Date     Time    Name
---------  ----------- -----   ----
5368709120  2022-10-10 06:56   sample
---------                     -------
5368709120                     1 file

Here, we use -l to list all files in the archive. Further, we can put vi to the same use:

$ vi sample.zip
[...]
" zip.vim version v31
" Browsing zipfile /sampledata/sample.zip
" Select a file with cursor and press ENTER

sample

In both cases, we see our single sample file. So, how do we get to its contents?

4.3. Pipe Archive Data

The unzip command offers the -p flag to extract a given file’s contents to a pipe:

$ unzip -p sample.zip sample | less

Here, we pipe the sample file from sample.zip through less to see it on the console without overloading stdout.

In addition, we can even use unzip -p with split to output and write the file to the main storage in chunks, effectively circumventing the size limit of FAT32 filesystems:

$ unzip -p sample.zip sample | split -d --bytes=3G - sample
$ wc --bytes <sample00
3221225472
$ wc --bytes <sample01
2147483648

Now, we have two chunks, each below 4GB in size and readable on FAT32.

5. Extract File Over 4GB on ext*

When extracting on ext2 and similar, we might still get an error:

$ stat --file-system --format=%T .
ext2/ext3
$ unzip sample.zip
error:  Zip file too big (greater than 4294959102 bytes)
Archive:  sample.zip
warning [sample.zip]:  5368709120 extra bytes at beginning or within zipfile

In fact, this issue happens with some (usually older) versions of unzip and stems from a limitation of the tool rather than the filesystem.

In this case, we can still use jar with its x extract option:

$ apt-get install default-jdk
[...]
$ jar -xf sample.zip

Alternatively, we can also employ 7z (7-Zip):

$ apt-get install p7zip-full
$ 7z x sample.zip

At this point, we have worked around both tool and filesystem constraints regarding file size when extracting an archive.

6. Summary

In this article, we discussed the limits of FAT32 regarding the size of files, how they affect archive extraction, and what we can do to avoid problems.

In conclusion, while tools and filesystems can reduce our options regarding file handling, we can leverage methods to work around many issues.

Administration

Scripting

Networking

Files

Processes

Full Archive

About Baeldung