If you have a few years of experience in the Java ecosystem, and you're interested in sharing that experience with the community (and getting paid for your work of course), have a look at the "Write for Us" page. Cheers. Eugen

If you’re working with Spring, check out "REST With Spring":

>> CHECK OUT THE COURSE

1. Overview

This tutorial will show how to read all the lines from a large file in Java in an efficient manner.

This article is part of the “Java – Back to Basic” tutorial here on Baeldung.

Further reading:

Java – Write an InputStream to a File

How to write an InputStream to a File - using Java, Guava and the Commons IO library.

Read more

Java – Convert File to InputStream

How to open an InputStream from a Java File - using plain Java, Guava and the Apache Commons IO library.

Read more

Java – Read from File

Read contents from a file in Java - using any of these: BufferedReader, Scanner, StreamTokenizer, DataInputStream, SequenceInputStream, FileChannel, etc.

Read more

2. Reading In Memory

The standard way of reading the lines of the file is in memory – both Guava and Apache Commons IO provide a quick way to do just that:

Files.readLines(new File(path), Charsets.UTF_8);
FileUtils.readLines(new File(path));

The problem with this approach is that all the file lines are kept in memory – which will quickly lead to OutOfMemoryError if the File is large enough.

For example – reading a ~1Gb file:

@Test
public void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException {
    String path = ...
    Files.readLines(new File(path), Charsets.UTF_8);
}

This starts off with a small amount of memory being consumed: (~0 Mb consumed)

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb

However, after the full file has been processed, we have at the end: (~2 Gb consumed)

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb

Which means that about 2.1 Gb of memory are consumed by the process – the reason is simple – the lines of the file are all being stored in memory now.

It should be obvious by this point that keeping in memory the contents of the file will quickly exhaust the available memory – regardless of how much that actually is.

What’s more, we usually don’t need all of the lines in the file in memory at once – instead, we just need to be able to iterate through each one, do some processing and throw it away. So, this is exactly what we’re going to do – iterate through the lines without holding the in memory.

3. Streaming Through the File

Let’s now look at a solution – we’re going to use a java.util.Scanner to run through the contents of the file and retrieve lines serially, one by one:

FileInputStream inputStream = null;
Scanner sc = null;
try {
    inputStream = new FileInputStream(path);
    sc = new Scanner(inputStream, "UTF-8");
    while (sc.hasNextLine()) {
        String line = sc.nextLine();
        // System.out.println(line);
    }
    // note that Scanner suppresses exceptions
    if (sc.ioException() != null) {
        throw sc.ioException();
    }
} finally {
    if (inputStream != null) {
        inputStream.close();
    }
    if (sc != null) {
        sc.close();
    }
}

This solution will iterate through all the lines in the file – allowing for processing of each line – without keeping references to them – and in conclusion, without keeping them in memory: (~150 Mb consumed)

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 763 Mb
[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 605 Mb

4. Streaming with Apache Commons IO

The same can be achieved using the Commons IO library as well, by using the custom LineIterator provided by the library:

LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");
try {
    while (it.hasNext()) {
        String line = it.nextLine();
        // do something with line
    }
} finally {
    LineIterator.closeQuietly(it);
}

Since the entire file is not fully in memory – this will also result in pretty conservative memory consumption numbers: (~150 Mb consumed)

[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Total Memory: 752 Mb
[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Free Memory: 564 Mb

5. Conclusion

This quick article shows how to process lines in a large file without iteratively, without exhausting the available memory – which proves quite useful when working with these large files.

The implementation of all these examples and code snippets can be found in our GitHub project – this is an Eclipse based project, so it should be easy to import and run as it is.

The new Certification Class of "REST With Spring" is finally out:

>> CHECK OUT THE COURSE

Sort by:   newest | oldest | most voted
David Liang
Guest

Apache IO commons, IOUtils.lineIterator() is a good alternative for you when reading a large file.

Eugen Paraschiv
Guest

Thanks for the suggestion David, I integrated it into the article. Cheers,
Eugen.

Srikanth Kanakamedala
Guest
Srikanth Kanakamedala

Eugen,

I following you on facebook and saving all your links but not sure how long it would be available, please let me know if all these were in one pdf.

Eugen Paraschiv
Guest

Hey Srikanth,
Baeldung is a standard site, so the content that’s here is permanent – don’t worry about it going away 🙂
Cheers
Eugen.

Brooks Hagenow
Guest

This is a bit off topic for the article but I have questions about example 3, “Streaming through a file”. Is there a reason you chose not to use try-with-resources? Also, closing the scanner will automatically close the FileInputStream so why bother with the if statement and explicitly closing the input stream?

Timur
Guest

But if 1 GB file has one line? This case is actual and more common

Eugen Paraschiv
Guest

For single line files you’ll have to break out NIO and read chunks of it. However, this comes with its own set of problems – such as – is the file binary or do you know you’re dealing with characters (among other things). A quick sidenote is that – if the file only has 1GB – it may be the case, depending on where this processing occurs – that you can still fit it in memory.
Cheers.
Eugen.

vamsi
Guest

If there is a need to read 2 or 3 1GB files that have a RDBMS kind of relationship how do you suggest an approach in that case?

Eugen Paraschiv
Guest

So – this method only illustrates the low level detail of streaming through the file; if you need to actually represent relations between the data in the files (and without more detail this is just a guess) – you can persist that data into a container that actually can represent relations (read SQL).
That being said – if you really only have 2 or 3 files – you can still fit that in memory just fine.
Hope this helps. Cheers,
Eugen.

Jason Hollister
Guest

Why not read and process each file the way you would to produce a DB index? Stash useful search keys and the byte location and size of each record in memory. Then use this to lookup from the larger file as needed.

Eugen Paraschiv
Guest

Hey Jason,
Sure, involving persistence is definitely an option in some cases. But keep in mind that comes with its own set of extra complexity and challenges. And, more importantly – it’s not always an option.
This is specifically about doing the reading purely with Java, in memory, with a simple solution. Hope that clears things up. Cheers,
Eugen.

Jason Hollister
Guest
Hi Eugen, Yes, I see that. I certainly don’t suggest making this any more complicated than it has to be, and moving the data into a proper database as you suggest is certainly the preferred way of dealing with relational data. But if that’s not an option (as may be the case for the writer), it seems to me you could borrow a common DB idea, that of an index, to store in memory the data needed from one file to use in processing another. Knowing what data you need from each file, it might even be enough to simply… Read more »
Eugen Paraschiv
Guest
That’s an interesting idea. Here are a couple of notes on it. First – the size is indeed critical – and storing partial data is key here. The underlying assumption is that you can’t hold all of the data in memory. And so the only option is to read through the data in chunks/lines, etc – and then keep portions of it in that index. So, if that’s needed – that would be the next step. But, it would still come after reading the data – that’s still the first step. And that still need to happen in a partial… Read more »
Mike
Guest

Reading lines one by one is trivial. What if the file has no end of lines?

Eugen Paraschiv
Guest

Hey Mike – if the file doesn’t have end lines (binary data) – I would recommend reading chunks of bytes. Cheers,
Eugen.

Mike
Guest

What if it is xml, and you need to parse it? I mean it is not neccessaryly a binary file. I am not trying to be a smart ass, seriousely. I am interested in solution for such problems – you have big (may be endless) stream of data. How to read it correctly and efficiently? In java.

Eugen Paraschiv
Guest

No worries Mike, asking is good. Unfortunately the answer is not straightforward and there’s no one size fits all solution. This article covers one type of file, but there are of course others. For XML you should look at SAX parses, because they don’t load up data into memory. However, depending on the type of data you’re dealing with, there are many potential answers to this question. Hope that helps. Cheers,
Eugen.

ritwik dey
Guest

For large XMLs, instead of SAX, u can also look into STaX parsers

wpDiscuz