Parallel Collection Processing with Parallel Collectors and Virtual Threads

Last updated: April 14, 2024

Written by: Grzegorz Piwowarek

Reviewed by: Andrea Cerasoni

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

Regression testing is an important step in the release process, to ensure that new code doesn't break the existing functionality. As the codebase evolves, we want to run these tests frequently to help catch any issues early on.

The best way to ensure these tests run frequently on an automated basis is, of course, to include them in the CI/CD pipeline. This way, the regression tests will execute automatically whenever we commit code to the repository.

In this tutorial, we'll see how to create regression tests using Selenium, and then include them in our pipeline using GitHub Actions:, to be run on the LambdaTest cloud grid:

>> How to Run Selenium Regression Tests With GitHub Actions

1. Introduction

In the previous article, we covered parallel-collectors, a small zero-dependency library that enables parallel processing for Stream API on custom thread pools.

Project Loom is the codename for the organized effort to introduce lightweight Virtual Threads (previously known as Fibers) to JVM, which was finalized in JDK21.

Let’s see how to leverage this in Parallel Collectors.

2. Maven Dependencies

If we want to start using the library, we need to add a single entry in Maven’s pom.xml file:

<dependency>
    <groupId>com.pivovarit</groupId>
    <artifactId>parallel-collectors</artifactId>
    <version>3.0.0</version>
</dependency>

Or a single line in Gradle’s build file:

compile 'com.pivovarit:parallel-collectors:3.0.0'

The newest version can be found on Maven Central.

3. Parallel Processing with OS Threads vs Virtual Threads

3.1. OS Thread Parallelism

Let’s see why parallel processing with Virtual Threads is a big deal.

We’ll start by creating a simple example. We’ll need an operation to parallelize, which is going to be an artificially delayed String concatenation:

private static String fetchById(int id) {
    try {
        Thread.sleep(1000);
    } catch (InterruptedException e) {
        // ignore shamelessly
    }
    return "user-" + id;
}

We’ll also use custom code for measuring the execution time:

private static <T> T timed(Supplier<T> supplier) {
    var before = Instant.now();
    T result = supplier.get();
    var after = Instant.now();
    log.info("Execution time: {} ms", Duration.between(before, after).toMillis());
    return result;
}

Now, let’s create a simple parallel Stream processing example in which we’re creating n elements and then processing them on n threads with parallelism of n:

@Test
public void processInParallelOnOSThreads() {
    int parallelProcesses = 5_000;
    var e = Executors.newFixedThreadPool(parallelProcesses);

    var result = timed(() -> Stream.iterate(0, i -> i + 1).limit(parallelProcesses)
      .collect(ParallelCollectors.parallel(i -> fetchById(i), toList(), e, parallelProcesses))
      .join());

    log.info("{}", result);
}

When we run it, we can observe that it clearly does the job because we don’t need to wait 5000 seconds for results:

Execution time: 1321 ms
[user-0, user-1, user-2, ...]

But let’s see what happens if we try to increase the number of elements processed in parallel to 20_000:

[2.795s][warning][os,thread] Failed to start thread "Unknown thread" - pthread_create failed (...)
[2.795s][warning][os,thread] Failed to start the native thread for java.lang.Thread "pool-1-thread-16111"

The os-thread-based approach doesn’t scale since threads are expensive to create, and we quickly reach resource limits.

Let’s see what happens if we switch to Virtual Threads.

3.2. Virtual Thread Parallelism

Before Java 21, it wasn’t easy to come up with reasonable defaults for thread pool configuration. Luckily, Virtual Threads don’t require any—we can create as many threads as we want, and they get internally scheduled on a shared ForkJoinPool instance, making them perfect for running blocking operations!

If we’re running Parallel Collectors 3.x, we can effortlessly leverage Virtual Threads:

@Test
public void processInParallelOnVirtualThreads() {
    int parallelProcesses = 5_000;

    var result = timed(() -> Stream.iterate(0, i -> i + 1).limit(parallelProcesses)
      .collect(ParallelCollectors.parallel(i -> fetchById(i), toList()))
      .join());
}

As we can see, this is as easy as omitting executor and parallelism parameters since Virtual Threads is the default execution utility.

If we try to run it, we can see that it actually completes faster than the original example:

Execution time: 1101 ms
[user-0, user-1, user-2, ...]

This is because we created 5000 Virtual Threads, which were scheduled using a highly limited set of OS threads.

Let’s try to increase the parallelism to 20_000, which wasn’t possible with a classic Executor:

Execution time: 1219 ms
[user-0, user-1, user-2, ...]

Not only did this execute successfully, but it was completed faster than a 4 times smaller job on OS threads!

Let’s increase the parallelism to 100_000 and see what happens:

Execution time: 1587 ms
[user-0, user-1, user-2, ...]

Works just fine, although significant overhead is observed.

What if we increase the parallelism level to 1_000_000?

Execution time: 6416 ms
[user-0, user-1, user-2, ...]

2_000_000?

Execution time: 12906 ms
[user-0, user-1, user-2, ...]

5_000_000?

Execution time: 25952 ms
[user-0, user-1, user-2, ...]

As we can see, we can easily scale to high levels of parallelism that weren’t achievable with OS threads. This, alongside performance improvements on smaller parallel workloads, is the main benefit of leveraging Virtual Threads for parallel processing of blocking operations.

3.3. Virtual Threads and Older Versions of Parallel Collectors

The easiest way to leverage Virtual Threads is to upgrade to the newest possible version of the library, but if this isn’t possible, we can also achieve this with a 2.x.y version while running on JDK21.

The trick is to manually provide Executors.newVirtualThreadPerTaskExecutor() as executor and Integer.MAX_VALUE as max parallelism level:

@Test
public void processInParallelOnVirtualThreadsParallelCollectors2() {
    int parallelProcesses = 100_000;

    var result = timed(() -> Stream.iterate(0, i -> i + 1).limit(parallelProcesses)
      .collect(ParallelCollectors.parallel(
        i -> fetchById(i), toList(), 
        Executors.newVirtualThreadPerTaskExecutor(), Integer.MAX_VALUE))
      .join());

    log.info("{}", result);
}

5. Conclusion

In this article, we had a chance to see how to effortlessly leverage Virtual Threads with the Parallel Collectors library, which turned out to scale much better than the classical OS-thread-based solution. Our test machine ended up hitting resource limits at around ~16000 threads, while it was easily possible to scale to millions of Virtual Threads.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.