Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Overview

This tutorial will show how to read all the lines from a large file in Java in an efficient manner.

This article is part of the “Java – Back to Basic” tutorial here on Baeldung.

Further reading:

Java - Write an InputStream to a File

How to write an InputStream to a File - using Java, Guava and the Commons IO library.

Java - Convert File to InputStream

How to open an InputStream from a Java File - using plain Java, Guava and the Apache Commons IO library.

2. Reading in Memory

The standard way of reading the lines of the file is in memory – both Guava and Apache Commons IO provide a quick way to do just that:

Files.readLines(new File(path), Charsets.UTF_8);
FileUtils.readLines(new File(path));

The problem with this approach is that all the file lines are kept in memory – which will quickly lead to OutOfMemoryError if the File is large enough.

For example – reading a ~1Gb file:

@Test
public void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException {
    String path = ...
    Files.readLines(new File(path), Charsets.UTF_8);
}

This starts off with a small amount of memory being consumed: (~0 Mb consumed)

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb

However, after the full file has been processed, we have at the end: (~2 Gb consumed)

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb

This means that about 2.1 Gb of memory are consumed by the process – the reason is simple – the lines of the file are all being stored in memory now.

It should be obvious by this point that keeping in memory the contents of the file will quickly exhaust the available memory – regardless of how much that actually is.

What’s more, we usually don’t need all of the lines in the file in memory at once – instead, we just need to be able to iterate through each one, do some processing and throw it away. So, this is exactly what we’re going to do – iterate through the lines without holding all of them in memory.

3. Streaming Through the File

Now, let’s explore different ways of reading a given file portion by portion.

3.1. Using Scanner

Here, we’re going to use a java.util.Scanner to run through the contents of the file and retrieve lines serially, one by one:

FileInputStream inputStream = null;
Scanner sc = null;
try {
    inputStream = new FileInputStream(path);
    sc = new Scanner(inputStream, "UTF-8");
    while (sc.hasNextLine()) {
        String line = sc.nextLine();
        // System.out.println(line);
    }
    // note that Scanner suppresses exceptions
    if (sc.ioException() != null) {
        throw sc.ioException();
    }
} finally {
    if (inputStream != null) {
        inputStream.close();
    }
    if (sc != null) {
        sc.close();
    }
}

This solution will iterate through all the lines in the file allowing for processing of each line without keeping references to them. In conclusion, without keeping the lines in memory: (~150 Mb consumed)

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 763 Mb
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 605 Mb

3.2. Using BufferedReader

Another solution would be using the BufferedReader class.

Typically, this class offers a convenient way to buffer characters to simplify the process of reading files.

For that purpose, it provides the readLine() method, which reads the content of a given file line by line.

So, let’s see in action:

try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
    while (br.readLine() != null) {
        // do something with each line
    }
}

BufferedReader reduces the number of I/O operations by reading the file chunk by chunk and caching the chunks in an internal buffer.

It exhibits better performance compared to Scanner as it focuses only on data retrieval without parsing.

3.3. Using Files.newBufferedReader()

Alternatively, we can use the Files.newBufferedReader() method to achieve the same thing:

try (BufferedReader br = java.nio.file.Files.newBufferedReader(Paths.get(fileName))) {
    while (br.readLine() != null) {
        // do something with each line
    }
}

As we can see, this method offers another way to return an instance of BufferedReader.

3.4. Using SeekableByteChannel

SeekableByteChannel provides a channel to read and manipulate a given file. It produces a faster performance than standard I/O classes as it’s backed by an auto-resizing byte array.

So, let’s see it in practice:

try (SeekableByteChannel ch = java.nio.file.Files.newByteChannel(Paths.get(fileName), StandardOpenOption.READ)) {
    ByteBuffer bf = ByteBuffer.allocate(1000);
    while (ch.read(bf) > 0) {
        bf.flip();
        // System.out.println(new String(bf.array()));
        bf.clear();
    }
}

As shown above, this interface comes with the read() method, which reads a sequence of bytes into the buffer denoted by ByteBuffer.

Typically, the flip() method makes the buffer ready again for writing. On the other hand, clear(), as the name indicates, resets and clears the buffer.

The only drawback of this approach is that we need to specify the buffer size explicitly using the allocate() method.

3.5. Using Stream API

Similarly, we can use the Stream API to read and process the content of a file.

Here, we’ll be using the Files class which provides the lines() method to return a stream of String elements:

try (Stream<String> lines = java.nio.file.Files.lines(Paths.get(fileName))) {
    lines.forEach(line -> {
        // do something with each line
    });
}

Please note that the file is processed lazily, which means that only a portion of the content is stored in memory at a given time.

4. Streaming With Apache Commons IO

The same can be achieved using the Commons IO library as well, by using the custom LineIterator provided by the library:

LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");
try {
    while (it.hasNext()) {
        String line = it.nextLine();
        // do something with line
    }
} finally {
    LineIterator.closeQuietly(it);
}

Since the entire file is not fully in memory – this will also result in pretty conservative memory consumption numbers: (~150 Mb consumed)

[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Total Memory: 752 Mb
[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Free Memory: 564 Mb

5. Conclusion

This quick article shows how to process lines in a large file without iteratively, without exhausting the available memory – which proves quite useful when working with these large files.

The implementation of all these examples and code snippets can be found in our GitHub project – this is a Maven-based project, so it should be easy to import and run as it is.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
Comments are closed on this article!