1. Overview
It’s common to encounter performance issues when writing applications that make repeated database or API calls inside a loop. It’s a familiar pattern, but it’s pretty inefficient; each call adds delay and puts unnecessary load on the system.
To solve this, a simple and elegant approach uses DataLoader, which batches and caches our requests.
In this tutorial, we’ll walk through what DataLoader is, how it helps optimize data fetching tasks, and how to use it in Java with a complete, working example.
2. What Is DataLoader?
DataLoader is a utility created by Facebook to optimize GraphQL API calls. Instead of making multiple calls to the database, it combines various individual data requests into a single request.
While it was initially created for GraphQL, DataLoader works with REST APIs, service-to-service calls in microservices, or any situation where repeated data fetching can be optimized through batching.
To embed it in our project, we need to include the java-dataloader library. For Maven, we’ll add the dependency to our pom.xml:
<dependency>
<groupId>com.graphql-java</groupId>
<artifactId>java-dataloader</artifactId>
<version>3.2.0</version>
</dependency>
As mentioned previously, it’s completely generic and works well in any Java application that makes multiple repeated backend calls. It helps improve performance without altering business logic, simply by modifying the way data is retrieved.
We are already aware of the problem statement. Without DataLoader, a loop that loads users one by one would result in multiple, separate database hits that would sacrifice performance:
userRepository.findById(1);
userRepository.findById(2);
userRepository.findById(3);
Each line triggers its own SQL query, resulting in three database calls.
With DataLoader, we can replace those with load() and dispatch() calls:
dataLoader.load(1);
dataLoader.load(2);
dataLoader.load(3);
dataLoader.dispatch();
The dispatch() method of dataLoader merges our load() calls as a single database query. Next, we’ll look at the specifics of setting up and using DataLoader.
3. Batch Processing Using DataLoader
Now that we understand why DataLoader exists, let’s see it in action by building a Java application that loads users from a real database. We’ll use H2, an in-memory SQL database, and set up a simple Spring Boot project with JPA.
3.1. Create a Simple User Entity Class
First, we need a simple, lightweight User class to represent our data. In our case, we’re loading users with an ID and a name:
@Entity
@Table(name = "users")
@Getter
@AllArgsConstructor
public class User {
private final String id;
private final String name;
}
We’ve used Lombok here. The @Getter annotation automatically generates getter methods for both ID and name, while @AllArgsConstructor creates a constructor that accepts both fields. Since both fields are marked final, they can only be set once during object creation. This ensures immutability, making our data model thread-safe and ideal for concurrent or async scenarios like batch loading with DataLoader.
3.2. Create a UserRepository Class
The UserRepository is a simple JPA repository that extends JpaRepository<User, String>. It inherits the method findAllById() that allows us to retrieve multiple user entities in a single query efficiently:
public interface UserRepository extends JpaRepository<User, String> {
}
UserRepository handles direct database access, while UserService manages the business logic and asynchronous execution. By using the built-in findAllById() method from Spring Data JPA, we enable efficient batching of database queries.
3.3. Create a UserService Class
Now, we need a way to fetch all our users at once, so we have a UserService class. It accepts a list of user IDs and fetches the corresponding user records asynchronously using Java’s CompletableFuture:
@Service
public class UserService {
private final UserRepository userRepository;
public UserService(UserRepository userRepository) {
this.userRepository = userRepository;
}
public CompletableFuture<List<User>> getUsersByIds(List<String> ids) {
return CompletableFuture.supplyAsync(() -> userRepository.findAllById(ids));
}
}
The getUsersByIds() method ensures that database calls don’t block the main thread. It also makes the service ready for non-blocking batch operations, which is essential when using DataLoader.
Under the hood, this service delegates the actual database interaction to the UserRepository.
3.4. DataLoader Configuration
Finally, we’re ready to use our service to load data. This configuration class wraps our UserService in a BatchLoader, which lets the DataLoader batch and resolve user requests efficiently.
Now, let’s look at the BatchLoader in more detail. BatchLoader is the core interface in the DataLoader library. It defines fetching data for multiple keys in a single call. Our function receives a list of keys and must return a CompletableFuture<List<V>> with items in the same order as the incoming keys; this is critical for DataLoader to match results correctly:
@Component
public class UserDataLoader {
private final UserService userService;
public UserDataLoader(UserService userService) {
this.userService = userService;
}
public DataLoader<String, User> createUserLoader() {
BatchLoader<String, User> userBatchLoader = ids -> {
return userService.getUsersByIds(ids)
.thenApply(users -> {
Map<String, User> userMap = users.stream()
.collect(Collectors.toMap(User::getId, user -> user));
return ids.stream().map(userMap::get).collect(Collectors.toList());
});
};
return DataLoaderFactory.newDataLoader(userBatchLoader);
}
}
The UserDataLoader class accepts a UserService as a dependency. It exposes the createUserLoader() that builds and returns a DataLoader object. Then, we define a BatchLoader<String, User>. This is the heart of the batching logic.
Whenever the DataLoader receives multiple calls to load(userId), it collects them, waits for the current process to finish, and then calls load() with the List of ids on this batch loader. We leverage the UserService.getUsersByIds() with the complete list of IDs to batch them all at once. Then we convert the list of User objects to a Map by id so that we can return the list of users in the same order as the incoming list of ids.
Finally, we pass this batch loader to DataLoaderFactory.newDataLoader(), which gives us a fully functional DataLoader<String, User> instance. This enables batching, caching, and efficient user fetching behind the scenes with very little boilerplate.
3.5. Additional Types of Batch Loaders
Besides the standard BatchLoader<K, V>, DataLoader also provides:
- MappedBatchLoader<K, V> – returns a Map<K, V> instead of a list, allowing us to skip manual ordering
- BatchLoaderWithContext<K, V, C> – gives us an extra context parameter, useful if our batch function needs request-scoped information
- MappedBatchLoaderWithContext<K, V, C> – combines both mapping and context features
In short, BatchLoader is the engine that powers DataLoader’s batching magic, and choosing the correct variant depends on how we want to structure our batch function output.
4. Proving the Database Will Be Called Only Once
Now, with all of our code to perform batch processing using DataLoader, we can finally use it to load a batch of users.
We’ll inject our UserService as a spied bean:
@SpyBean
private UserService userService;
This allows us to connect to the live database while also being able to assert on function calls to our service.
Next, let’s pre-load three User entities into the database and create our DataLoader:
@BeforeEach
void setUp() {
userRepository.deleteAll();
User user1 = new User("101", "User_101");
User user2 = new User("102", "User_102");
User user3 = new User("103", "User_103");
userRepository.saveAll(Arrays.asList(user1, user2, user3));
userDataLoader = new DataLoader<>(userService::getUsersByIds);
DataLoaderRegistry registry = new DataLoaderRegistry();
registry.register("userDataLoader", userDataLoader);
}
Finally, we implement the DataLoader logic:
@Test
void whenLoadingUsers_thenBatchLoaderIsInvokedAndResultsReturned() {
CompletableFuture<User> userFuture1 = userDataLoader.load("101");
CompletableFuture<User> userFuture2 = userDataLoader.load("102");
CompletableFuture<User> userFuture3 = userDataLoader.load("103");
userDataLoader.dispatchAndJoin();
verify(userService, times(1)).getUsersByIds(anyList());
assertThat(userFuture1.join().getName()).isEqualTo("User_101");
assertThat(userFuture2.join().getName()).isEqualTo("User_102");
assertThat(userFuture3.join().getName()).isEqualTo("User_103");
}
We queue up three load() calls, which do not execute immediately. Once we call dispatchAndJoin(), the DataLoader batches these calls together and triggers the underlying batch function with all our IDs.
We then use future.get() to retrieve each user, which blocks until the result is ready.
To confirm the batching magic of DataLoader, we use verify() with times() to assert that getUsersByIds() is only called once. This proves that even though we requested three different users, the service method was invoked exactly once, meaning all requests were batched together.
Finally, we assert that the returned users match the pre-loaded entities and are received in the expected order.
5. Conclusion
In this article, we saw that making repeated database or service calls can quickly become a bottleneck. That’s where DataLoader shines. It helps batch and cache calls efficiently, reducing load, improving response times, and simplifying code. Whether we’re building GraphQL APIs, REST endpoints, or microservices, incorporating DataLoader can make a noticeable difference.
The code backing this article is available on GitHub. Once you're
logged in as a Baeldung Pro Member, start learning and coding on the project.