DataLoader for Batch Processing

Last updated: October 2, 2025

Written by: Neetika Khandelwal

Reviewed by: Brandon Ward

Data

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Browser testing is essential if you have a website or web applications that users interact with. Manual testing can be very helpful to an extent, but given the multiple browsers available, not to mention versions and operating system, testing everything manually becomes time-consuming and repetitive.

To help automate this process, Selenium is a popular choice for developers, as an open-source tool with a large and active community. What's more, we can further scale our automation testing by running on theLambdaTest cloud-based testing platform.

Read more through our step-by-step tutorial on how to set up Selenium tests with Java and run them on LambdaTest:

>> Automated Browser Testing With Selenium

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

1. Overview

It’s common to encounter performance issues when writing applications that make repeated database or API calls inside a loop. It’s a familiar pattern, but it’s pretty inefficient; each call adds delay and puts unnecessary load on the system.

To solve this, a simple and elegant approach uses DataLoader, which batches and caches our requests.

In this tutorial, we’ll walk through what DataLoader is, how it helps optimize data fetching tasks, and how to use it in Java with a complete, working example.

2. What Is DataLoader?

DataLoader is a utility created by Facebook to optimize GraphQL API calls. Instead of making multiple calls to the database, it combines various individual data requests into a single request.

While it was initially created for GraphQL, DataLoader works with REST APIs, service-to-service calls in microservices, or any situation where repeated data fetching can be optimized through batching.

To embed it in our project, we need to include the java-dataloader library. For Maven, we’ll add the dependency to our pom.xml:

<dependency>
    <groupId>com.graphql-java</groupId>
    <artifactId>java-dataloader</artifactId>
    <version>3.2.0</version>
</dependency>

As mentioned previously, it’s completely generic and works well in any Java application that makes multiple repeated backend calls. It helps improve performance without altering business logic, simply by modifying the way data is retrieved.

We are already aware of the problem statement. Without DataLoader, a loop that loads users one by one would result in multiple, separate database hits that would sacrifice performance:

userRepository.findById(1);
userRepository.findById(2);
userRepository.findById(3);

Each line triggers its own SQL query, resulting in three database calls.

With DataLoader, we can replace those with load() and dispatch() calls:

dataLoader.load(1);
dataLoader.load(2);
dataLoader.load(3);
dataLoader.dispatch();

The dispatch() method of dataLoader merges our load() calls as a single database query. Next, we’ll look at the specifics of setting up and using DataLoader.

3. Batch Processing Using DataLoader

Now that we understand why DataLoader exists, let’s see it in action by building a Java application that loads users from a real database. We’ll use H2, an in-memory SQL database, and set up a simple Spring Boot project with JPA.

3.1. Create a Simple User Entity Class

First, we need a simple, lightweight User class to represent our data. In our case, we’re loading users with an ID and a name:

@Entity
@Table(name = "users")
@Getter
@AllArgsConstructor
public class User {
    private final String id;
    private final String name;
}

We’ve used Lombok here. The @Getter annotation automatically generates getter methods for both ID and name, while @AllArgsConstructor creates a constructor that accepts both fields. Since both fields are marked final, they can only be set once during object creation. This ensures immutability, making our data model thread-safe and ideal for concurrent or async scenarios like batch loading with DataLoader.

3.2. Create a UserRepository Class

The UserRepository is a simple JPA repository that extends JpaRepository<User, String>. It inherits the method findAllById() that allows us to retrieve multiple user entities in a single query efficiently:

public interface UserRepository extends JpaRepository<User, String> {
}

UserRepository handles direct database access, while UserService manages the business logic and asynchronous execution. By using the built-in findAllById() method from Spring Data JPA, we enable efficient batching of database queries.

3.3. Create a UserService Class

Now, we need a way to fetch all our users at once, so we have a UserService class. It accepts a list of user IDs and fetches the corresponding user records asynchronously using Java’s CompletableFuture:

@Service
public class UserService {
    private final UserRepository userRepository;

    public UserService(UserRepository userRepository) {
        this.userRepository = userRepository;
    }

    public CompletableFuture<List<User>> getUsersByIds(List<String> ids) {
        return CompletableFuture.supplyAsync(() -> userRepository.findAllById(ids));
    }
}

The getUsersByIds() method ensures that database calls don’t block the main thread. It also makes the service ready for non-blocking batch operations, which is essential when using DataLoader.

Under the hood, this service delegates the actual database interaction to the UserRepository.

3.4. DataLoader Configuration

Finally, we’re ready to use our service to load data. This configuration class wraps our UserService in a BatchLoader, which lets the DataLoader batch and resolve user requests efficiently.

Now, let’s look at the BatchLoader in more detail. BatchLoader is the core interface in the DataLoader library. It defines fetching data for multiple keys in a single call. Our function receives a list of keys and must return a CompletableFuture<List<V>> with items in the same order as the incoming keys; this is critical for DataLoader to match results correctly:

@Component
public class UserDataLoader {

    private final UserService userService;

    public UserDataLoader(UserService userService) {
        this.userService = userService;
    }

    public DataLoader<String, User> createUserLoader() {
        BatchLoader<String, User> userBatchLoader = ids -> {
            return userService.getUsersByIds(ids)
              .thenApply(users -> {
                  Map<String, User> userMap = users.stream()
                    .collect(Collectors.toMap(User::getId, user -> user));
                  return ids.stream().map(userMap::get).collect(Collectors.toList());
            });
        };

        return DataLoaderFactory.newDataLoader(userBatchLoader);
    }
}

The UserDataLoader class accepts a UserService as a dependency. It exposes the createUserLoader() that builds and returns a DataLoader object. Then, we define a BatchLoader<String, User>. This is the heart of the batching logic.

Whenever the DataLoader receives multiple calls to load(userId), it collects them, waits for the current process to finish, and then calls load() with the List of ids on this batch loader. We leverage the UserService.getUsersByIds() with the complete list of IDs to batch them all at once. Then we convert the list of User objects to a Map by id so that we can return the list of users in the same order as the incoming list of ids.

Finally, we pass this batch loader to DataLoaderFactory.newDataLoader(), which gives us a fully functional DataLoader<String, User> instance. This enables batching, caching, and efficient user fetching behind the scenes with very little boilerplate.

3.5. Additional Types of Batch Loaders

Besides the standard BatchLoader<K, V>, DataLoader also provides:

MappedBatchLoader<K, V> – returns a Map<K, V> instead of a list, allowing us to skip manual ordering
BatchLoaderWithContext<K, V, C> – gives us an extra context parameter, useful if our batch function needs request-scoped information
MappedBatchLoaderWithContext<K, V, C> – combines both mapping and context features

In short, BatchLoader is the engine that powers DataLoader’s batching magic, and choosing the correct variant depends on how we want to structure our batch function output.

4. Proving the Database Will Be Called Only Once

Now, with all of our code to perform batch processing using DataLoader, we can finally use it to load a batch of users.

We’ll inject our UserService as a spied bean:

@SpyBean
private UserService userService;

This allows us to connect to the live database while also being able to assert on function calls to our service.

Next, let’s pre-load three User entities into the database and create our DataLoader:

@BeforeEach
void setUp() {
    userRepository.deleteAll();
    User user1 = new User("101", "User_101");
    User user2 = new User("102", "User_102");
    User user3 = new User("103", "User_103");
    userRepository.saveAll(Arrays.asList(user1, user2, user3));

    userDataLoader = new DataLoader<>(userService::getUsersByIds);
    DataLoaderRegistry registry = new DataLoaderRegistry();
    registry.register("userDataLoader", userDataLoader);
}

Finally, we implement the DataLoader logic:

@Test
void whenLoadingUsers_thenBatchLoaderIsInvokedAndResultsReturned() {
    CompletableFuture<User> userFuture1 = userDataLoader.load("101");
    CompletableFuture<User> userFuture2 = userDataLoader.load("102");
    CompletableFuture<User> userFuture3 = userDataLoader.load("103");

    userDataLoader.dispatchAndJoin();

    verify(userService, times(1)).getUsersByIds(anyList());

    assertThat(userFuture1.join().getName()).isEqualTo("User_101");
    assertThat(userFuture2.join().getName()).isEqualTo("User_102");
    assertThat(userFuture3.join().getName()).isEqualTo("User_103");
}

We queue up three load() calls, which do not execute immediately. Once we call dispatchAndJoin(), the DataLoader batches these calls together and triggers the underlying batch function with all our IDs.

We then use future.get() to retrieve each user, which blocks until the result is ready.

To confirm the batching magic of DataLoader, we use verify() with times() to assert that getUsersByIds() is only called once. This proves that even though we requested three different users, the service method was invoked exactly once, meaning all requests were batched together.

Finally, we assert that the returned users match the pre-loaded entities and are received in the expected order.

5. Conclusion

In this article, we saw that making repeated database or service calls can quickly become a bottleneck. That’s where DataLoader shines. It helps batch and cache calls efficiently, reducing load, improving response times, and simplifying code. Whether we’re building GraphQL APIs, REST endpoints, or microservices, incorporating DataLoader can make a noticeable difference.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

Try a 14-Day Free Trial of Orkes Conductor today.

Modern Java teams move fast — but codebases don’t always keep up. Frameworks change, dependencies drift, and tech debt builds until it starts to drag on delivery. OpenRewrite was built to fix that: an open-source refactoring engine that automates repetitive code changes while keeping developer intent intact.

The monthly training series, led by the creators and maintainers of OpenRewrite at Moderne, walks through real-world migrations and modernization patterns. Whether you’re new to recipes or ready to write your own, you’ll learn practical ways to refactor safely and at scale.

If you’ve ever wished refactoring felt as natural — and as fast — as writing code, this is a good place to start.