Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE

1. Introduction

In this quick tutorial, we’ll see a few alternatives using core Java and external libraries to search for files in a directory (including sub-directories) that match a specific extension. We’ll go from simple arrays and lists to streams and other newer methods.

2. Setting up Our Filter

Since we need to filter files by extension, let’s start with a simple Predicate implementation. We’ll need a little input sanitization to ensure we match most use cases, like accepting extension names beginning with a dot or not:

public class MatchExtensionPredicate implements Predicate<Path> {

    private final String extension;

    public MatchExtensionPredicate(String extension) {
        if (!extension.startsWith(".")) {
            extension = "." + extension;
        }
        this.extension = extension.toLowerCase();
    }

    @Override
    public boolean test(Path path) {
        if (path == null) {
            return false;
        }
        return path.getFileName()
          .toString()
          .toLowerCase()
          .endsWith(extension);
    }
}

We start by writing our constructor, which prepends a dot before the extension name (if it doesn’t already contain one). Then, we transform it to lowercase. This way, when we compare it with other files, we can ensure they have the same case. Finally, we implement test() by getting the Path‘s file name and transforming it to lowercase. Most importantly, we check if it ends with the extension name we’re looking for.

3. Traversing Directories With Files.listFiles()

Our first example will use a method that’s been around since the dawn of Java: Files.listFiles(). Let’s start by instantiating a List to store our results and listing all files in the directory:

List<File> find(File startPath, String extension) {
    List<File> matches = new ArrayList<>();

    File[] files = startPath.listFiles();
    if (files == null) {
       return matches;
    }

    // ...
}

By itself, listFiles() doesn’t operate recursively, so for every item, if we identify it’s a directory, we start recursing:

MatchExtensionPredicate filter = new MatchExtensionPredicate(extension);
for (File file : files) {
    if (file.isDirectory()) {
        matches.addAll(find(file, extension));
    } else if (filter.test(file.toPath())) {
        matches.add(file);
    }
}

return matches;

We also instantiate our filter and only add the current file to our list if it passes our test() implementation. Ultimately, we’ll have all the results matching our filter. Note that this can cause a StackOverflowError in directory trees that are too deep and an OutOfMemoryError in directories that contain too many files. We’ll see options that perform better later.

4. Traversing Directories With Files.walkFileTree() From Java 7 Onwards

Starting with Java 7, we have the NIO2 APIs. It included many utilities like the Files class and a new way to handle files with the Path class. Using walkFileTree() allows us to traverse a directory recursively with zero effort. This method only requires a starting Path and a FileVisitor implementation:

List<Path> find(Path startPath, String extension) throws IOException {
    List<Path> matches = new ArrayList<>();

    Files.walkFileTree(startPath, new SimpleFileVisitor<Path>() {
        
        @Override
        public FileVisitResult visitFile(Path file, BasicFileAttributes attributes) {
            if (new MatchExtensionPredicate(extension).test(file)) {
                matches.add(file);
            }
            return FileVisitResult.CONTINUE;
        }

        @Override
        public FileVisitResult visitFileFailed(Path file, IOException exc) {
            return FileVisitResult.CONTINUE;
        }
    });
    return matches;
}

FileVisitor contains callbacks for a few events: before entering a directory, after leaving a directory, when visiting a file, and when this visit fails. But, with SimpleFileVisitor, we only need to implement the callbacks we’re interested in. In this case, it’s visiting a file with visitFile(). So, for every file visited, we test it against our Predicate and add it to a list of matching files.

Also, we’re implementing visitFileFailed() to always return FileVisitResult.CONTINUE. With this, we can continue searching for files even if an exception – like access denied – occurs.

5. Streaming With Files.walk() From Java 8 Onwards

Java 8 included a simpler way to traverse directories that integrate with the Stream API. Here’s how our method looks with Files.walk():

Stream<Path> find(Path startPath, String extension) throws IOException {
    return Files.walk(startPath)
      .filter(new MatchExtensionPredicate(extension));
}

Unfortunately, this breaks at the first exception thrown, and there’s no way to handle this yet. So, let’s try a different approach. We’ll start by implementing a FileVisitor that contains a Consumer<Path>. This time, we’ll use this Consumer to do whatever we want with our file matches instead of accumulating them in a List:

public class SimpleFileConsumerVisitor extends SimpleFileVisitor<Path> {

    private final Predicate<Path> filter;
    private final Consumer<Path> consumer;

    public SimpleFileConsumerVisitor(MatchExtensionPredicate filter, Consumer<Path> consumer) {
        this.filter = filter;
        this.consumer = consumer;
    }

    @Override
    public FileVisitResult visitFile(Path file, BasicFileAttributes attributes) {
        if (filter.test(file)) {
            consumer.accept(file);
        }
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFileFailed(Path file, IOException exc) throws IOException {
        return FileVisitResult.CONTINUE;
    }
}

Finally, let’s modify our find() method to use it:

void find(Path startPath, String extension, Consumer<Path> consumer) throws IOException {
    MatchExtensionPredicate filter = new MatchExtensionPredicate(extension);
    Files.walkFileTree(startPath, new SimpleFileConsumerVisitor(filter, consumer));
}

Note that we had to go back to Files.walkFileTree() to use our FileVisitor implementation.

6. Using Apache Commons IO’s FileUtils.iterateFiles()

Another helpful option is FileUtils.iterateFiles() from Apache Commons IO, which returns an Iterator. Let’s include its dependency:

<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.15.1</version>
</dependency>

With its dependency, we can also use the WildcardFileFilter instead of our MatchExtensionPredicate:

Iterator<File> find(Path startPath, String extension) {
    if (!extension.startsWith(".")) {
        extension = "." + extension;
    }
    return FileUtils.iterateFiles(
      startPath.toFile(), 
      WildcardFileFilter.builder().setWildcards("*" + extension).get(), 
      TrueFileFilter.INSTANCE);
}

We start our method by ensuring the extension name is in the expected format. Checking if it’s necessary to prepend a dot allows our method to work if we pass “.extension” or just “extension.”

As with other methods, it just needs a starting directory. But, since this is an older API, it requires a File instead of a Path. The last argument is an optional directory filter. But, if not specified, it ignores subdirectories. So, we include a TrueFileFilter.INSTANCE to make sure the whole directory tree is visited.

7. Conclusion

In this article, we explored various approaches to searching for files in a directory and its subdirectories based on a specified extension. We started by setting up a flexible extension matching Predicate. Then, we covered different techniques, ranging from the traditional Files.listFiles() and Files.walkFileTree() methods to more modern alternatives introduced in Java 8, such as Files.walk(). Also, we explored the usage of Apache Commons IO’s FileUtils.iterateFiles() for a different perspective.

As always, the source code is available over on GitHub.

Course – LS – All

Get started with Spring and Spring Boot, through the Learn Spring course:

>> CHECK OUT THE COURSE
res – REST with Spring (eBook) (everywhere)
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.