How to Download an Entire S3 Bucket in AWS

1. Introduction

Currently, when using the AWS Management Console, we’re limited to downloading one object at a time from an S3 bucket. But with the SDK, CLI, or CloudShell, we can download multiple objects at once. So, if we ever need to download a whole S3 bucket, they would be our options.

In this tutorial, we’ll discuss how to download an entire S3 bucket in AWS.

2. Using aws s3 sync

With AWS CLI installed on our virtual machine or from CloudShell, we can copy an entire S3 bucket using the aws s3 sync command:

$ aws s3 sync s3://baeldung-copy-entire-s3 .
download: s3://baeldung-copy-entire-s3/file.txt to ./file.txt

In the command above, we copied the bucket named baeldung-copy-entire-s3 to our current working directory. If we want it in a different directory, we’ll replace . with the path of our desired target.

If our default region differs from the bucket’s region, we’ll specify the bucket’s region using –region:

$ aws s3 sync s3://baeldung-copy-entire-s3 . --region=eu-west-1

Then, if we want to exclude an object or group of objects from the process, we can pass a matching pattern to the –exclude option. The –include option does the opposite.

2.1. Copying Large Buckets

While S3 objects have a size limit of 5TB, S3 buckets are unlimited. So, when copying a large S3 bucket, the process can be slow. However, we can make it faster by increasing the max_concurrent_requests using aws configure:

$ aws configure set default.s3.max_concurrent_requests 500

By default, the max_concurrent_requests is 10. But when we run the command above, we’ll raise it to 500. After that, we’ll have more concurrent download requests – potentially up to 500.

Of course, besides the max_concurrent_requests, factors like proximity to the bucket’s region and network throughput may also affect the speed. So, working in a region close to us and opting for resources with faster throughput can make things quicker.

Using a Virtual Private Cloud may also make things faster if we’re copying to an EC2 instance in the same region as the bucket. S3 Transfer Acceleration is another way to hasten things. But that option will come at an extra cost. Also, its availability is currently limited.

3. Using the Java SDK

At the moment, the Java SDK is the only one with an established method for downloading all the contents of an S3 bucket:

import software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.transfer.s3.S3TransferManager;
import software.amazon.awssdk.transfer.s3.model.DownloadDirectoryRequest;
import java.nio.file.Paths;

public class Bucket {
    static final S3TransferManager transferManager = createTM();

    // Create Connection
    static S3TransferManager createTM() {
        S3Client.builder()
          .credentialsProvider(DefaultCredentialsProvider.create())
          .build();
        S3TransferManager transferManager = S3TransferManager.builder()
          .build();
        return transferManager;
    }

    static void downloadBucket(S3TransferManager transferManager, String destinationPath, String bucketName) {
        transferManager.downloadDirectory(DownloadDirectoryRequest.builder()
            .destination(Paths.get(destinationPath))
            .bucket(bucketName)
            .build())
          .completionFuture().join();
    }

    public static void main(String[] args) {
        downloadBucket(transferManager, ".", "baeldung-copy-entire-s3"); 
    } 
}

To copy our S3 bucket using the Java SDK, we needed the following:

DefaultCredentialsProvider class (software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider) to provide credentials to our S3Transfermanager instance
S3Client utility (software.amazon.awssdk.services.s3.S3Client) to create a client connection to AWS S3
S3Transfermanager utility (software.amazon.awssdk.transfer.s3.S3TransferManager)
DownloadDirectoryRequest class (software.amazon.awssdk.transfer.s3.model.DownloadDirectoryRequest)
Paths (java.nio.file.Paths) class to convert the download destination string to a Path

In the snippet above, we created two methods in the Bucket class besides the main method: createTM and downloadBucket.

The createTM method creates a client connection to Amazon S3. Then, it builds an S3TransferManager instance and returns it.

In our downloadBucket method, the downloadDirectory method creates an instance of DownloadDirectoryRequest. Then, this instance downloads all objects in our bucket to the specified destination using our S3TransferManager instance.

When we ran our code, it downloaded all the files in our S3 bucket to the root of our project directory. This is so because we specified the destinationPath as “.”. Of course, we can specify a different path by supplying the appropriate string.

In our illustration, we used our DefaultCredentialsProvider when creating an S3 client connection. But we could have also used a custom credential provider.

4. Using the Python SDK

We can tweak the Python SDK to do something similar to what we did with the Java SDK. Let’s try downloading an entire S3 bucket with the Python SDK:

import boto3
import os

s3 = boto3.client('s3')
bucket = boto3.resource('s3').Bucket('baeldung-copy-entire-s3')

def get_bucket():
    for object in bucket.objects.all():
        filename = os.getcwd() + '/' + object.key
        s3.download_file(bucket.name, object.key, filename)
get_bucket()

In the snippet above, we basically looped through all objects in our bucket. Then, we passed each object’s key as a key argument to S3’s download_file method. With that, the script downloaded our bucket’s entire content.

We can use the list_objects_v2 method instead of objects.all(). But in either case, we’ll use a for loop, and this may not be very efficient if the bucket has many objects. Of course, in such instances, using the Java SDK might be better.

5. Conclusion

In this article, we discussed how to download an S3 bucket using AWS CLI, Java SDK, and Python SDK. We should keep in mind that the Java SDK method might be more efficient when copying large S3 buckets.

Full Archive

About Baeldung