Guide to Work Stealing in Java

Last updated: June 11, 2024

Written by: baeldung

Reviewed by: Greg Martin

Java Concurrency

Definition

Azure Container Apps is a fully managed serverless container service that enables you to build and deploy modern, cloud-native Java applications and microservices at scale. It offers a simplified developer experience while providing the flexibility and portability of containers.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, you can get started over on the documentation page.

And, you can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Traditional keyword-based search methods rely on exact word matches, often leading to irrelevant results depending on the user's phrasing.

By comparison, using a vector store allows us to represent the data as vector embeddings, based on meaningful relationships. We can then compare the meaning of the user’s query to the stored content, and retrieve more relevant, context-aware results.

Explore how to build an intelligent chatbot using MongoDB Atlas, Langchain4j and Spring Boot:

>> Building an AI Chatbot in Java With Langchain4j and MongoDB Atlas

Accessibility testing is a crucial aspect to ensure that your application is usable for everyone and meets accessibility standards that are required in many countries.

By automating these tests, teams can quickly detect issues related to screen reader compatibility, keyboard navigation, color contrast, and other aspects that could pose a barrier to using the software effectively for people with disabilities.

Learn how to automate accessibility testing with Selenium and the LambdaTest cloud-based testing platform that lets developers and testers perform accessibility automation on over 3000+ real environments:

Automated Accessibility Testing With Selenium

1. Overview

In this tutorial, we’ll look at the concept of work stealing in Java.

2. What Is Work Stealing?

Work stealing was introduced in Java with the aim of reducing contention in multi-threaded applications. This is done using the fork/join framework.

2.1. Divide and Conquer Approach

In the fork/join framework, problems or tasks are recursively broken down into sub-tasks. The sub-tasks are then solved individually, with the sub-results combined to form the result:

Result solve(Problem problem) {
    if (problem is small)
        directly solve problem
    else {
        split problem into independent parts
        fork new subtasks to solve each part
        join all subtasks
        compose result from subresults
    }
}

2.2. Worker Threads

The broken-down task is solved with the help of worker threads provided by a thread pool. Each worker thread will have sub-tasks it’s responsible for. These are stored in double-ended queues (deques).

Each worker thread gets sub-tasks from its deque by continuously popping a sub-task off the top of the deque. When a worker thread’s deque is empty, it means that all the sub-tasks have been popped off and completed.

At this point, the worker thread randomly selects a peer thread-pool thread it can “steal” work from. It then uses the first-in, first-out approach (FIFO) to take sub-tasks from the tail end of the victim’s deque.

3. Fork/Join Framework Implementation

We can create a work-stealing thread pool using either the ForkJoinPool class or the Executors class:

ForkJoinPool commonPool = ForkJoinPool.commonPool();
ExecutorService workStealingPool = Executors.newWorkStealingPool();

The Executors class has an overloaded newWorkStealingPool method, which takes an integer argument representing the level of parallelism.

Executors.newWorkStealingPool is an abstraction of ForkJoinPool.commonPool. The only difference is that Executors.newWorkStealingPool creates a pool in asynchronous mode and ForkJoinPool.commonPool doesn’t.

4. Synchronous vs Asynchronous Thread Pools

ForkJoinPool.commonPool uses a last-in, first-out (LIFO) queue configuration, whereas Executors.newWorkStealingPool uses first-in, first-out approach (FIFO) one.

According to Doug Lea, the FIFO approach has these advantages over LIFO:

It reduces contention by having stealers operate on the opposite side of the deque as owners
It exploits the property of recursive divide−and−conquer algorithms of generating “large” tasks early

The second point above means that it is possible to further break down an older stolen task by a thread that stole it.

As per the Java documentation, setting asyncMode to true may be suitable for use with event-style tasks that are never joined.

5. Working Example – Finding Prime Numbers

We’ll use the example of finding prime numbers from a collection of numbers to show the computation time benefits of the work-stealing framework. We’ll also show the differences between using synchronous and asynchronous thread pools.

5.1. The Prime Numbers Problem

Finding prime numbers from a collection of numbers can be a computationally expensive process. This is mainly due to the size of the collection of numbers.

The PrimeNumbers class helps us find prime numbers:

public class PrimeNumbers extends RecursiveAction {

    private int lowerBound;
    private int upperBound;
    private int granularity;
    static final List<Integer> GRANULARITIES
      = Arrays.asList(1, 10, 100, 1000, 10000);
    private AtomicInteger noOfPrimeNumbers;

    PrimeNumbers(int lowerBound, int upperBound, int granularity, AtomicInteger noOfPrimeNumbers) {
        this.lowerBound = lowerBound;
        this.upperBound = upperBound;
        this.granularity = granularity;
        this.noOfPrimeNumbers = noOfPrimeNumbers;
    }

    // other constructors and methods

    private List<PrimeNumbers> subTasks() {
        List<PrimeNumbers> subTasks = new ArrayList<>();

        for (int i = 1; i <= this.upperBound / granularity; i++) {
            int upper = i * granularity;
            int lower = (upper - granularity) + 1;
            subTasks.add(new PrimeNumbers(lower, upper, noOfPrimeNumbers));
        }
        return subTasks;
    }

    @Override
    protected void compute() {
        if (((upperBound + 1) - lowerBound) > granularity) {
            ForkJoinTask.invokeAll(subTasks());
        } else {
            findPrimeNumbers();
        }
    }

    void findPrimeNumbers() {
        for (int num = lowerBound; num <= upperBound; num++) {
            if (isPrime(num)) {
                noOfPrimeNumbers.getAndIncrement();
            }
        }
    }

    public int noOfPrimeNumbers() {
        return noOfPrimeNumbers.intValue();
    }
}

A few important things to note about this class:

It extends RecursiveAction, which allows us to implement the compute method used in computing tasks using a thread pool
It recursively breaks down tasks into sub-tasks based on the granularity value
The constructors take lower and upper bound values which control the range of numbers we want to determine prime numbers for
It enables us to determine prime numbers using either a work-stealing thread pool or a single thread

5.2. Solving the Problem Faster with Thread Pools

Let’s determine prime numbers in a single-threaded manner and also using work-stealing thread pools.

First, let’s see the single-threaded approach:

PrimeNumbers primes = new PrimeNumbers(10000);
primes.findPrimeNumbers();

And now, the ForkJoinPool.commonPool approach:

PrimeNumbers primes = new PrimeNumbers(10000);
ForkJoinPool pool = ForkJoinPool.commonPool();
pool.invoke(primes);
pool.shutdown();

Finally, we’ll have a look at the Executors.newWorkStealingPool approach:

PrimeNumbers primes = new PrimeNumbers(10000);
int parallelism = ForkJoinPool.getCommonPoolParallelism();
ForkJoinPool stealer = (ForkJoinPool) Executors.newWorkStealingPool(parallelism);
stealer.invoke(primes);
stealer.shutdown();

We use the invoke method of the ForkJoinPool class to pass tasks to the thread pool. This method takes in instances of sub-classes of RecursiveAction. Using Java Microbench Harness, we benchmark these different approaches against each other in terms of the average time per operation:

# Run complete. Total time: 00:04:50

Benchmark                                                      Mode  Cnt    Score   Error  Units
PrimeNumbersUnitTest.Benchmarker.commonPoolBenchmark           avgt   20  119.885 ± 9.917  ms/op
PrimeNumbersUnitTest.Benchmarker.newWorkStealingPoolBenchmark  avgt   20  119.791 ± 7.811  ms/op
PrimeNumbersUnitTest.Benchmarker.singleThread                  avgt   20  475.964 ± 7.929  ms/op

It is clear that both ForkJoinPool.commonPool and Executors.newWorkStealingPool allow us to determine prime numbers faster than with a single-threaded approach.

The fork/join pool framework lets us break down the task into sub-tasks. We broke down the collection of 10,000 integers into batches of 1-100, 101-200, 201-300 and so on. We then determined prime numbers for each batch and made the total number of prime numbers available with our noOfPrimeNumbers method.

5.3. Stealing Work to Compute

With a synchronous thread pool, ForkJoinPool.commonPool puts threads in the pool as long as the task is still in progress. As a result, the level of work stealing is not dependent on the level of task granularity.

The asynchronous Executors.newWorkStealingPool is more managed, allowing the level of work stealing to be dependent on the level of task granularity.

We get the level of work stealing using the getStealCount of the ForkJoinPool class:

long steals = forkJoinPool.getStealCount();

Determining the work-stealing count for Executors.newWorkStealingPool and ForkJoinPool.commonPool gives us dissimilar behavior:

Executors.newWorkStealingPool ->
Granularity: [1], Steals: [6564]
Granularity: [10], Steals: [572]
Granularity: [100], Steals: [56]
Granularity: [1000], Steals: [60]
Granularity: [10000], Steals: [1]

ForkJoinPool.commonPool ->
Granularity: [1], Steals: [6923]
Granularity: [10], Steals: [7540]
Granularity: [100], Steals: [7605]
Granularity: [1000], Steals: [7681]
Granularity: [10000], Steals: [7681]

When granularity changes from fine to coarse (1 to 10,000) for Executors.newWorkStealingPool, the level of work stealing decreases. Therefore, the steal count is one when the task is not broken down (granularity of 10,000).

The ForkJoinPool.commonPool has a different behavior. The level of work stealing is always high and not influenced much by the change in task granularity.

Technically speaking, our prime numbers example is one that supports asynchronous processing of event-style tasks. This is because our implementation does not enforce the joining of results.

A case can be made that Executors.newWorkStealingPool offers the best use of resources in solving the problem.

6. Conclusion

In this article, we looked at work stealing and how to apply it using the fork/join framework. We also looked at the examples of work stealing and how it can improve processing time and use of resources.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Of course, Azure Container Apps has really solid support for our ecosystem, from a number of build options, managed Java components, native metrics, dynamic logger, and quite a bit more.

To learn more about Java features on Azure Container Apps, visit the documentation page.

You can also ask questions and leave feedback on the Azure Container Apps GitHub page.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.