Pagination With JDBC

Last updated: February 5, 2024

Written by: Suraj Mishra

Reviewed by: Eric Martin

Persistence

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

Distributed systems often come with complex challenges such as service-to-service communication, state management, asynchronous messaging, security, and more.

Dapr (Distributed Application Runtime) provides a set of APIs and building blocks to address these challenges, abstracting away infrastructure so we can focus on business logic.

In this tutorial, we'll focus on Dapr's pub/sub API for message brokering. Using its Spring Boot integration, we'll simplify the creation of a loosely coupled, portable, and easily testable pub/sub messaging system:

>> Flexible Pub/Sub Messaging With Spring Boot and Dapr

1. Introduction

Large table reads can cause our application to run out of memory. They also add extra load to the database and require more bandwidth to execute. The recommended approach while reading a large table is to use paginated queries. Essentially, we read a subset (page) of data, process the data, and then move to the next page.

In this article, we’ll discuss and implement different strategies for pagination with JDBC.

2. Setup

First, we need to add the appropriate JDBC dependency based on our database in the pom.xml file so that we can connect to our database. For example, if our database is PostgreSQL, we need to add the PostgreSQL dependency:

<dependency>
    <groupId>org.postgresql</groupId>
    <artifactId>postgresql</artifactId>
    <version>42.6.0</version>
</dependency>

Second, we’ll need a large dataset to make a paginated query. Let’s create an employees table and insert one million records into it:

CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    salary DECIMAL(10, 2)
);

INSERT INTO employees (first_name, last_name, salary)
SELECT
    'FirstName' || series_number,
    'LastName' || series_number,
    (random() * 100000)::DECIMAL(10, 2) -- Adjust the range as needed
FROM generate_series(1, 1000000) as series_number;

Lastly, we’ll create a connection object inside our sample app and configure it with our database connection:

Connection connect() throws SQLException {
    Connection connection = DriverManager.getConnection(url, user, password);
    if (connection != null) {
        System.out.println("Connected to database");
    }
    return connection;
}

3. Pagination With JDBC

Our dataset contains about 1M records, and querying it all together puts pressure not only on the database but also on bandwidth since more data needs to be transferred for a given moment. Additionally, it puts pressure on our in-memory app space since more data needs to fit in RAM. It is always recommended to read and process in pages or batches when reading large datasets.

JDBC doesn’t provide out-of-the-box methods to read in pages, but there are approaches that we can implement by ourselves. We’ll be discussing and implementing two such approaches.

3.1. Using LIMIT and OFFSET

We can use LIMIT and OFFSET along with our select query to return the defined size of results. The LIMIT clause gets us the number of rows that we want to return, while the OFFSET clause skips the defined number of rows from the query result. We can then paginate our query by controlling the OFFSET position.

In the below logic, we’ve defined LIMIT as pageSize and offset as the start position for the reading of the records:

ResultSet readPageWithLimitAndOffset(Connection connection, int offset, int pageSize) throws SQLException {
    String sql = """
        SELECT * FROM employees
        LIMIT ? OFFSET ?
    """;
    PreparedStatement preparedStatement = connection.prepareStatement(sql);
    preparedStatement.setInt(1, pageSize);
    preparedStatement.setInt(2, offset);

    return preparedStatement.executeQuery();
}

The query result is a single page of data. To read the entire table in pagination, we iterate for each page, process each page’s records, and then move to the next page.

3.2. Using a Sorted Key With LIMIT

We can also take advantage of the sorted key with LIMIT to read results in batches. For example, in our employees table, we have an ID column that is an auto-increment column and has an index on it. We’ll use this ID column to set a lower bound for our page, and LIMIT will help us to set an upper bound for the page:

ResultSet readPageWithSortedKeys(Connection connection, int lastFetchedId, int pageSize) throws SQLException {
    String sql = """
      SELECT * FROM employees
      WHERE id > ? LIMIT ?
    """;
    PreparedStatement preparedStatement = connection.prepareStatement(sql);
    preparedStatement.setInt(1, lastFetchedId);
    preparedStatement.setInt(2, pageSize);

    return preparedStatement.executeQuery();
}

As we can see in the above logic, we’re passing lastFetchedId as the lower bound for the page, and pageSize would be the upper bound that we set with LIMIT.

4. Testing

Let’s test our logic by writing simple unit tests. For testing, we’ll set up a database and insert 1M records into the table. We’re running setup() and tearDown() methods once per test class for setting up test data and tearing it down:

@BeforeAll
public static void setup() throws Exception {
    connection = connect(JDBC_URL, USERNAME, PASSWORD);
    populateDB();
}

@AfterAll
public static void tearDown() throws SQLException {
    destroyDB();
}

The populateDB() method first creates an employees table and inserts sample records for 1M employees:

private static void populateDB() throws SQLException {
    String createTable = """
        CREATE TABLE EMPLOYEES (
            id SERIAL PRIMARY KEY,
            first_name VARCHAR(50),
            last_name VARCHAR(50),
            salary DECIMAL(10, 2)
        );
        """;
    PreparedStatement preparedStatement = connection.prepareStatement(createTable);
    preparedStatement.execute();

    String load = """
        INSERT INTO EMPLOYEES (first_name, last_name, salary)
        VALUES(?,?,?)
    """;
    IntStream.rangeClosed(1,1_000_000).forEach(i-> {
        PreparedStatement preparedStatement1 = null;
        try {
            preparedStatement1 = connection.prepareStatement(load);
            preparedStatement1.setString(1,"firstname"+i);
            preparedStatement1.setString(2,"lastname"+i);
            preparedStatement1.setDouble(3, 100_000+(1_000_000-100_000)+Math.random());

            preparedStatement1.execute();
        } catch (SQLException e) {
            throw new RuntimeException(e);
        }
    });
}

Our tearDown() method destroys the employees table:

private static void destroyDB() throws SQLException {
    String destroy = """
        DROP table EMPLOYEES;
    """;
    connection
      .prepareStatement(destroy)
      .execute();
}

Once we’ve set up the test data, we can write a simple unit test for the LIMIT and OFFSET approach to verify the page size:

@Test
void givenDBPopulated_WhenReadPageWithLimitAndOffset_ThenReturnsPaginatedResult() throws SQLException {
    int offset = 0;
    int pageSize = 100_000;
    int totalPages = 0;
    while (true) {
        ResultSet resultSet = PaginationLogic.readPageWithLimitAndOffset(connection, offset, pageSize);
        if (!resultSet.next()) {
            break;
        }

        List<String> resultPage = new ArrayList<>();
        do {
            resultPage.add(resultSet.getString("first_name"));
        } while (resultSet.next());

        assertEquals("firstname" + (resultPage.size() * (totalPages + 1)), resultPage.get(resultPage.size() - 1));
        offset += pageSize;
        totalPages++;
    }
    assertEquals(10, totalPages);
}

As we can see above, we’re also looping until we’ve read all the database records in pages, and for each page, we’re verifying the last read record.

Similarly, we can write another test for pagination with sorted keys using the ID column:

@Test
void givenDBPopulated_WhenReadPageWithSortedKeys_ThenReturnsPaginatedResult() throws SQLException {
    PreparedStatement preparedStatement = connection.prepareStatement("SELECT min(id) as min_id, max(id) as max_id FROM employees");
    ResultSet resultSet = preparedStatement.executeQuery();
    resultSet.next();

    int minId = resultSet.getInt("min_id");
    int maxId = resultSet.getInt("max_id");
    int lastFetchedId = 0; // assign lastFetchedId to minId

    int pageSize = 100_000;
    int totalPages = 0;

    while ((lastFetchedId + pageSize) <= maxId) {
        resultSet = PaginationLogic.readPageWithSortedKeys(connection, lastFetchedId, pageSize);
        if (!resultSet.next()) {
            break;
        }

        List<String> resultPage = new ArrayList<>();
        do {
            resultPage.add(resultSet.getString("first_name"));
            lastFetchedId = resultSet.getInt("id");
        } while (resultSet.next());

        assertEquals("firstname" + (resultPage.size() * (totalPages + 1)), resultPage.get(resultPage.size() - 1));
        totalPages++;
    }
    assertEquals(10, totalPages);
}

As we can see above, we’re looping over the entire table to read all the data, one page at a time. We’re finding minId and maxId that’ll help us define our iteration window for the loop. Then, we’re asserting the last read record for each page and the total page size.

5. Conclusion

In this article, we discussed reading large datasets in batches instead of reading them all in one query. We discussed and implemented two approaches along with a unit test verifying the working.

LIMIT and OFFSET methods may turn inefficient for large datasets since they read all the rows and skips defined by OFFSET position, while the sorted key approach is efficient since it only queries relevant data using a sorted key that is indexed as well.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.