Download a File From an URL in Java

Azure Spring Apps is a fully managed service from Microsoft (built in collaboration with VMware), focused on building and deploying Spring Boot applications on Azure Cloud without worrying about Kubernetes.

And, the Enterprise plan comes with some interesting features, such as commercial Spring runtime support, a 99.95% SLA and some deep discounts (up to 47%) when you are ready for production.

>> Learn more and deploy your first Spring Boot app to Azure.

You can also ask questions and leave feedback on the Azure Spring Apps GitHub page.

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

The Jet Profiler was built for MySQL only, so it can do things like real-time query performance, focus on most used tables or most frequent queries, quickly identify performance issues and basically help you optimize your queries.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

Accelerate Your Jakarta EE Development with Payara Server!

With best-in-class guides and documentation, Payara essentially simplifies deployment to diverse infrastructures.

Beyond that, it provides intelligent insights and actions to optimize Jakarta EE applications.

The goal is to apply an opinionated approach to get to what's essential for mission-critical applications - really solid scalability, availability, security, and long-term support:

>> Download and Explore the Guide (to learn more)

The AI Assistant to boost Boost your productivity writing unit tests - Machinet AI.

AI is all the rage these days, but for very good reason. The highly practical coding companion, you'll get the power of AI-assisted coding and automated unit test generation.
Machinet's Unit Test AI Agent utilizes your own project context to create meaningful unit tests that intelligently aligns with the behavior of the code.
And, the AI Chat crafts code and fixes errors with ease, like a helpful sidekick.

Simplify Your Coding Journey with Machinet AI:

>> Install Machinet AI in your IntelliJ

Looking for the ideal Linux distro for running modern Spring apps in the cloud?

Meet Alpaquita Linux: lightweight, secure, and powerful enough to handle heavy workloads.

This distro is specifically designed for running Java apps. It builds upon Alpine and features significant enhancements to excel in high-density container environments while meeting enterprise-grade security standards.

Specifically, the container image size is ~30% smaller than standard options, and it consumes up to 30% less RAM:

>> Try Alpaquita Containers now.

DbSchema is a super-flexible database designer, which can take you from designing the DB with your team all the way to safely deploying the schema.

The way it does all of that is by using a design model, a database-independent image of the schema, which can be shared in a team using GIT and compared or deployed on to any database.

And, of course, it can be heavily visual, allowing you to interact with the database using diagrams, visually compose queries, explore the data, generate random data, import data or build HTML5 database reports.

>> Take a look at DBSchema

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

1. Overview

In this tutorial, we’ll see several methods that we can use to download a file.

We’ll cover examples ranging from the basic usage of Java IO to the NIO package as well as some common libraries like AsyncHttpClient and Apache Commons IO.

Finally, we’ll talk about how we can resume a download if our connection fails before the whole file is read.

2. Using Java IO

The most basic API we can use to download a file is Java IO. We can use the URL class to open a connection to the file we want to download.

To effectively read the file, we’ll use the openStream() method to obtain an InputStream:

BufferedInputStream in = new BufferedInputStream(new URL(FILE_URL).openStream())

When reading from an InputStream, it’s recommended to wrap it in a BufferedInputStream to increase the performance.

The performance increase comes from buffering. When reading one byte at a time using the read() method, each method call implies a system call to the underlying file system. When the JVM invokes the read() system call, the program execution context switches from user mode to kernel mode and back.

This context switch is expensive from a performance perspective. When we read a large number of bytes, the application performance will be poor, due to a large number of context switches involved.

For writing the bytes read from the URL to our local file, we’ll use the write() method from the FileOutputStream class:

try (BufferedInputStream in = new BufferedInputStream(new URL(FILE_URL).openStream());
  FileOutputStream fileOutputStream = new FileOutputStream(FILE_NAME)) {
    byte dataBuffer[] = new byte[1024];
    int bytesRead;
    while ((bytesRead = in.read(dataBuffer, 0, 1024)) != -1) {
        fileOutputStream.write(dataBuffer, 0, bytesRead);
    }
} catch (IOException e) {
    // handle exception
}

When using a BufferedInputStream, the read() method will read as many bytes as we set for the buffer size. In our example, we’re already doing this by reading blocks of 1024 bytes at a time, so BufferedInputStream isn’t necessary.

The example above is very verbose, but luckily, as of Java 7, we have the Files class that contains helper methods for handling IO operations.

We can use the Files.copy() method to read all the bytes from an InputStream and copy them to a local file:

InputStream in = new URL(FILE_URL).openStream();
Files.copy(in, Paths.get(FILE_NAME), StandardCopyOption.REPLACE_EXISTING);

Our code works well but can be improved. Its main drawback is the fact that the bytes are buffered into memory.

Fortunately, Java offers us the NIO package that has methods to transfer bytes directly between two Channels without buffering.

We’ll go into detail in the next section.

3. Using NIO

The Java NIO package offers the possibility to transfer bytes between two Channels without buffering them into the application memory.

To read the file from our URL, we’ll create a new ReadableByteChannel from the URL stream:

ReadableByteChannel readableByteChannel = Channels.newChannel(url.openStream());

The bytes read from the ReadableByteChannel will be transferred to a FileChannel corresponding to the file that will be downloaded:

FileOutputStream fileOutputStream = new FileOutputStream(FILE_NAME);
FileChannel fileChannel = fileOutputStream.getChannel();

We’ll use the transferFrom() method from the ReadableByteChannel class to download the bytes from the given URL to our FileChannel:

fileChannel.transferFrom(readableByteChannel, 0, Long.MAX_VALUE);

The transferTo() and transferFrom() methods are more efficient than simply reading from a stream using a buffer. Depending on the underlying operating system, the data can be transferred directly from the filesystem cache to our file without copying any bytes into the application memory.

On Linux and UNIX systems, these methods use the zero-copy technique that reduces the number of context switches between the kernel mode and user mode.

4. Using Libraries

We’ve seen in the examples above how to download content from an URL just by using the Java core functionality.

We also can leverage the functionality of existing libraries to ease our work, when performance tweaks aren’t needed.

For example, in a real-world scenario, we’d need our download code to be asynchronous.

We could wrap all the logic into a Callable, or we could use an existing library for this.

4.1. AsyncHttpClient

AsyncHttpClient is a popular library for executing asynchronous HTTP requests using the Netty framework. We can use it to execute a GET request to the file URL and get the file content.

First, we need to create an HTTP client:

AsyncHttpClient client = Dsl.asyncHttpClient();

The downloaded content will be placed into a FileOutputStream:

FileOutputStream stream = new FileOutputStream(FILE_NAME);

Next, we create an HTTP GET request and register an AsyncCompletionHandler handler to process the downloaded content:

client.prepareGet(FILE_URL).execute(new AsyncCompletionHandler<FileOutputStream>() {

    @Override
    public State onBodyPartReceived(HttpResponseBodyPart bodyPart) 
      throws Exception {
        stream.getChannel().write(bodyPart.getBodyByteBuffer());
        return State.CONTINUE;
    }

    @Override
    public FileOutputStream onCompleted(Response response) 
      throws Exception {
        return stream;
    }
})

Notice that we’ve overridden the onBodyPartReceived() method. The default implementation accumulates the HTTP chunks received into an ArrayList. This could lead to high memory consumption, or an OutOfMemory exception when trying to download a large file.

Instead of accumulating each HttpResponseBodyPart into memory, we use a FileChannel to write the bytes to our local file directly. We’ll use the getBodyByteBuffer() method to access the body part content through a ByteBuffer.

ByteBuffers have the advantage that the memory is allocated outside of the JVM heap, so it doesn’t affect our application memory.

4.2. Apache Commons IO

Another highly used library for IO operation is Apache Commons IO. We can see from the Javadoc that there’s a utility class named FileUtils that we use for general file manipulation tasks.

To download a file from an URL, we can use this one-liner:

FileUtils.copyURLToFile(
  new URL(FILE_URL), 
  new File(FILE_NAME), 
  CONNECT_TIMEOUT, 
  READ_TIMEOUT);

From a performance standpoint, this code is the same as the one from Section 2.

The underlying code uses the same concepts of reading in a loop some bytes from an InputStream and writing them to an OutputStream.

One difference is that here the URLConnection class is used to control the connection time-outs so that the download doesn’t block for a large amount of time:

URLConnection connection = source.openConnection();
connection.setConnectTimeout(connectionTimeout);
connection.setReadTimeout(readTimeout);

5. Resumable Download

Considering internet connections fail from time to time, it’s useful to be able to resume a download, instead of downloading the file again from byte zero.

Let’s rewrite the first example from earlier to add this functionality.

The first thing to know is that we can read the size of a file from a given URL without actually downloading it by using the HTTP HEAD method:

URL url = new URL(FILE_URL);
HttpURLConnection httpConnection = (HttpURLConnection) url.openConnection();
httpConnection.setRequestMethod("HEAD");
long removeFileSize = httpConnection.getContentLengthLong();

Now that we have the total content size of the file, we can check whether our file is partially downloaded.

If so, we’ll resume the download from the last byte recorded on disk:

long existingFileSize = outputFile.length();
if (existingFileSize < fileLength) {
    httpFileConnection.setRequestProperty(
      "Range", 
      "bytes=" + existingFileSize + "-" + fileLength
    );
}

Here we’ve configured the URLConnection to request the file bytes in a specific range. The range will start from the last downloaded byte and will end at the byte corresponding to the size of the remote file.

Another common way to use the Range header is for downloading a file in chunks by setting different byte ranges. For example, to download 2 KB file, we can use the range 0 – 1024 and 1024 – 2048.

Another subtle difference from the code in Section 2 is that the FileOutputStream is opened with the append parameter set to true:

OutputStream os = new FileOutputStream(FILE_NAME, true);

After we’ve made this change, the rest of the code is identical to the one from Section 2.

6. Conclusion

We’ve seen in this article several ways to download a file from an URL in Java.

The most common implementation is to buffer the bytes when performing the read/write operations. This implementation is safe to use even for large files because we don’t load the whole file into memory.

We’ve also seen how to implement a zero-copy download using Java NIO Channels. This is useful because it minimized the number of context switches done when reading and writing bytes, and by using direct buffers, the bytes are not loaded into the application memory.

Also, because downloading a file is usually done over HTTP, we’ve shown how to achieve this using the AsyncHttpClient library.

The source code for the article is available over on GitHub.

Download a File From an URL in Java

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Overview

2. Using Java IO

3. Using NIO

4. Using Libraries

4.1. AsyncHttpClient

4.2. Apache Commons IO

5. Resumable Download

6. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course:

REST with Spring

Learn Spring Security ▼▲

Learn Spring Security Core

Learn Spring Security OAuth

Learn Spring

Learn Spring Data JPA

Persistence

REST

Security

Full Archive

Baeldung Ebooks

About Baeldung

Write for Baeldung

Get started with Spring and Spring Boot, through the Learn Spring course:

1. Overview

2. Using Java IO

3. Using NIO

4. Using Libraries

4.1. AsyncHttpClient

4.2. Apache Commons IO

5. Resumable Download

6. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course: