Persistence top

Get started with Spring Data JPA through the reference Learn Spring Data JPA course:

>> CHECK OUT THE COURSE
Authors Top

If you have a few years of experience in the Java ecosystem, and you’d like to share that with the community, have a look at our Contribution Guidelines.

1. Overview

In this tutorial, we'll learn how to effectively insert a vast amount of data into our target RDBMS using Spring JDBC Batch support, and we'll compare the performance of using a batch insert versus multiple single inserts.

2. Understanding Batch Processing

Once our application establishes a connection to a database, we can execute multiple SQL statements in one go instead of sending each statement one by one. Thus, we significantly decrease the communication overhead.

One option to achieve this is using the Spring JDBC API, which is the focus of the following sections.

2.1. Supporting Databases

Even though the JDBC API provides the batch functionality, it's not guaranteed that the underlying JDBC driver we are using has actually implemented these APIs and supports this functionality.

Spring provides a utility method called JdbcUtils.supportsBatchUpdates() that takes a JDBC Connection as a parameter, and simply returns true or false. However, in most cases with the JdbcTemplate API, Spring already checks it for us and otherwise falls back to regular behavior.

2.2. Factors That May Affect the Overall Performance

There are a few aspects we should consider when inserting a significant amount of data:

  • the number of connections we create to talk to the database server
  • the table we are inserting
  • the number of database requests we make to execute a single logical task

Usually, to overcome the first point, we use connection pooling. This helps by reusing already existing connections instead of creating new ones.

Another significant point is the target table. To be precise, the more indexed columns we have, the worse the performance will be, because the database server needs to adjust the indexes after each new row.

Lastly, we can use batch support to decrease the number of roundtrips to insert a lot of entries.

However, we should be aware that not all JDBC drivers/database servers provide the same efficiency level for batch operations even though they support it. For example, while database servers such as Oracle, Postgres, SQL Server, and DB2 provide a significant gain, MySQL provides poorer gain without any additional configuration.

3. Spring JDBC Batch Inserts

In this example, we'll use Postgres 14 as our database server. So, we need to add the corresponding postgresql JDBC driver to our dependencies:

<dependency>
    <groupId>org.postgresql</groupId>
    <artifactId>postgresql</artifactId>
    <scope>runtime</scope>
</dependency>

Then, in order to use Spring's JDBC abstraction, let's add the spring-boot-starter-jdbc dependency as well:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>

For demonstration purposes, we'll explore 2 different methods: first, we'll do regular inserts for each record, and then we'll try to take advantage of the batch support. In either case, we'll use a single transaction.

Let's get started with our simple Product table first:

CREATE TABLE product (
    id              SERIAL PRIMARY KEY,
    title           VARCHAR(40),
    created_ts      timestamp without time zone,
    price           numeric
);

And here's the corresponding model Product class:

public class Product {
    private long id;
    private String title;
    private LocalDateTime createdTs;
    private BigDecimal price;
  
    // standard setters and getters
}

3.1. Configuring the Data Source

By adding the below configuration into our application.properties Spring Boot creates a DataSource and a JdbcTemplate bean for us:

spring.datasource.url=jdbc:postgresql://localhost:5432/sample-baeldung-db
spring.datasource.username=postgres
spring.datasource.password=root
spring.datasource.driver-class-name=org.postgresql.Driver

3.2. Preparing Regular Inserts

We start by creating a simple repository interface to save the list of products:

public interface ProductRepository {
    void saveAll(List<Product> products);
}

Then the first implementation simply iterates over the products and inserts them one by one in the same transaction:

@Repository
public class SimpleProductRepository implements ProductRepository {

    private JdbcTemplate jdbcTemplate;

    public SimpleProductRepository(JdbcTemplate jdbcTemplate) {
      this.jdbcTemplate = jdbcTemplate;
    }

    @Override
    @Transactional
    public void saveAll(List<Product> products) {
      for (Product product : products) {
        jdbcTemplate.update("INSERT INTO PRODUCT (TITLE, CREATED_TS, PRICE) " +
          "VALUES (?, ?, ?)",
          product.getTitle(),
          Timestamp.valueOf(product.getCreatedTs()),
          product.getPrice());
      }
    }

}

Now, we need a service class ProductService that generates the given number of Product objects and starts the insertion process. First, we have a method to generate the given number of Product instances in a randomized fashion using some predefined values:

public class ProductService {

    private ProductRepository productRepository;
    private Random random;
    private Clock clock;

    // constructor for the dependencies

    private List<Product> generate(int count) {
        final String[] titles = { "car", "plane", "house", "yacht" };
        final BigDecimal[] prices = {
          new BigDecimal("12483.12"),
          new BigDecimal("8539.99"),
          new BigDecimal("88894"),
          new BigDecimal("458694")
        };

        final List<Product> products = new ArrayList<>(count);

        for (int i = 0; i < count; i++) {
            Product product = new Product();
            product.setCreatedTs(LocalDateTime.now(clock));
            product.setPrice(prices[random.nextInt(4)]);
            product.setTitle(titles[random.nextInt(4)]);
            products.add(product);
        }
        return products;
    }
}

Second, we add another method into ProductService class that takes generated Product instances and inserts them:

@Transactional
public long createProducts(int count) {
  List<Product> products = generate(count);
  long startTime = clock.millis();
  productRepository.saveAll(products);
  return clock.millis() - startTime;
}

To make the ProductService a Spring bean, let's add the below configuration as well:

@Configuration
public class AppConfig {

    @Bean
    public ProductService simpleProductService(SimpleProductRepository simpleProductRepository) {
      return new ProductService(simpleProductRepository, new Random(), Clock.systemUTC());
    }
}

As we can see, this ProductService bean uses the SimpleProductRepository to perform regular inserts.

3.3. Preparing Batch Inserts

Now, it's time to see Spring JDBC batch support in action. First of all, let's start creating another batch implementation of our ProductRepository class:

@Repository
public class BatchProductRepository implements ProductRepository {

    private JdbcTemplate jdbcTemplate;

    public BatchProductRepository(JdbcTemplate jdbcTemplate) {
      this.jdbcTemplate = jdbcTemplate;
    }

    @Override
    @Transactional
    public void saveAll(List<Product> products) {
      jdbcTemplate.batchUpdate("INSERT INTO PRODUCT (TITLE, CREATED_TS, PRICE) " +
        "VALUES (?, ?, ?)",
        products,
        100,
        (PreparedStatement ps, Product product) -> {
          ps.setString(1, product.getTitle());
          ps.setTimestamp(2, Timestamp.valueOf(product.getCreatedTs()));
          ps.setBigDecimal(3, product.getPrice());
        });
     }
}

It's important to note here that for this example we use the batch size 100. It means Spring will batch every 100 inserts and send them separately. In other words, it'll help us to decrease the number of roundtrips 100 times.

Usually, the recommended batch size is 50-100, but it highly depends on our database server configurations and the size of each batch package.

For example, MySQL Server has the configuration property called max_allowed_packet with a 64MB limit for each network package. While setting the batch size, we need to be careful not to exceed our database server limits.

Now, we add an additional ProductService bean configuration in the AppConfig class:

@Bean
public ProductService batchProductService(BatchProductRepository batchProductRepository) {
  return new ProductService(batchProductRepository, new Random(), Clock.systemUTC());
}

4. Performance Comparisons

It's time to run our example and take a look at the benchmarking. For the sake of simplicity, we prepare a Command-Line Spring Boot application by implementing the CommandLineRunner interface provided by Spring. We run our example multiple times for both the approaches:

@SpringBootApplication
public class SpringJdbcBatchPerformanceApplication implements CommandLineRunner {

    @Autowired
    @Qualifier("batchProductService")
    private ProductService batchProductService;
    @Autowired
    @Qualifier("simpleProductService")
    private ProductService simpleProductService;

    public static void main(String[] args) {
      SpringApplication.run(SpringJdbcBatchPerformanceApplication.class, args);
    }

    @Override
    public void run(String... args) throws Exception {
      int[] recordCounts = {1, 10, 100, 1000, 10_000, 100_000, 1000_000};

      for (int recordCount : recordCounts) {
        long regularElapsedTime = simpleProductService.createProducts(recordCount);
        long batchElapsedTime = batchProductService.createProducts(recordCount);

        System.out.println(String.join("", Collections.nCopies(50, "-")));
        System.out.format("%-20s%-5s%-10s%-5s%8sms\n", "Regular inserts", "|", recordCount, "|", regularElapsedTime);
        System.out.format("%-20s%-5s%-10s%-5s%8sms\n", "Batch inserts", "|", recordCount, "|", batchElapsedTime);
        System.out.printf("Total gain: %d %s\n", calculateGainInPercent(regularElapsedTime, batchElapsedTime), "%");
      }

    }

    int calculateGainInPercent(long before, long after) {
      return (int) Math.floor(100D * (before - after) / before);
    }
}

And here's our benchmarking result:

--------------------------------------------------
Regular inserts     |    1         |          14ms
Batch inserts       |    1         |           8ms
Total gain: 42 %
--------------------------------------------------
Regular inserts     |    10        |           4ms
Batch inserts       |    10        |           1ms
Total gain: 75 %
--------------------------------------------------
Regular inserts     |    100       |          29ms
Batch inserts       |    100       |           6ms
Total gain: 79 %
--------------------------------------------------
Regular inserts     |    1000      |         175ms
Batch inserts       |    1000      |          24ms
Total gain: 86 %
--------------------------------------------------
Regular inserts     |    10000     |         861ms
Batch inserts       |    10000     |         128ms
Total gain: 85 %
--------------------------------------------------
Regular inserts     |    100000    |        5098ms
Batch inserts       |    100000    |        1126ms
Total gain: 77 %
--------------------------------------------------
Regular inserts     |    1000000   |       47738ms
Batch inserts       |    1000000   |       13066ms
Total gain: 72 %
--------------------------------------------------

The results look quite promising.

However, that's not all. Some databases such as Postgres, MySQL, and SQL Server support multi-value inserts. It helps to decrease the overall size of insert statements. Let's see how this works in general:

-- REGULAR INSERTS TO INSERT 4 RECORDS
INSERT INTO PRODUCT
(TITLE, CREATED_TS, PRICE)
VALUES
 ('test1', LOCALTIMESTAMP, 100.10);

INSERT INTO PRODUCT
(TITLE, CREATED_TS, PRICE)
VALUES
 ('test2', LOCALTIMESTAMP, 101.10);
 
INSERT INTO PRODUCT
(TITLE, CREATED_TS, PRICE)
VALUES
 ('test3', LOCALTIMESTAMP, 102.10);

INSERT INTO PRODUCT
(TITLE, CREATED_TS, PRICE)
VALUES
 ('test4', LOCALTIMESTAMP, 103.10);

-- EQUIVALENT MULTI-VALUE INSERT
INSERT INTO PRODUCT
(TITLE, CREATED_TS, PRICE)
VALUES
 ('test1', LOCALTIMESTAMP, 100.10),
 ('test2', LOCALTIMESTAMP, 101.10),
 ('test3', LOCALTIMESTAMP, 102.10),
 ('test4', LOCALTIMESTAMP, 104.10);

To take advantage of this feature with a Postgres database, it's enough to set spring.datasource.hikari.data-source-properties.reWriteBatchedInserts=true in our application.properties file. The underlying JDBC driver starts rewriting our regular insert statements into multi-value ones for batched inserts.

This configuration is specific to Postgres. Other supporting databases might have different configuration requirements.

Let's re-run our application with this feature enabled and see the difference:

--------------------------------------------------
Regular inserts     |    1         |          15ms
Batch inserts       |    1         |          10ms
Total gain: 33 %
--------------------------------------------------
Regular inserts     |    10        |           3ms
Batch inserts       |    10        |           2ms
Total gain: 33 %
--------------------------------------------------
Regular inserts     |    100       |          42ms
Batch inserts       |    100       |          10ms
Total gain: 76 %
--------------------------------------------------
Regular inserts     |    1000      |         141ms
Batch inserts       |    1000      |          19ms
Total gain: 86 %
--------------------------------------------------
Regular inserts     |    10000     |         827ms
Batch inserts       |    10000     |         104ms
Total gain: 87 %
--------------------------------------------------
Regular inserts     |    100000    |        5093ms
Batch inserts       |    100000    |         981ms
Total gain: 80 %
--------------------------------------------------
Regular inserts     |    1000000   |       50482ms
Batch inserts       |    1000000   |        9821ms
Total gain: 80 %
--------------------------------------------------

We can see that enabling this feature increases the overall performance when we have a relatively large data set.

5. Conclusion

In this article, we created a simple example to show how we can benefit from Spring JDBC batch support for inserts. We compared regular inserts against the batched ones and got around 80-90 % of performance gain. Certainly, while using batch functionality, we also need to consider the support of our JDBC driver and its efficiency.

Additionally, we learned that some databases/drivers provide the multi-value insert capability to boost the performance even more and we saw how to utilize it in the case of Postgres.

As always, the source code for the example is available over on GitHub.

Persistence bottom
Get started with Spring Data JPA through the reference Learn Spring Data JPA course: >> CHECK OUT THE COURSE
Persistence footer banner
guest
0 Comments
Inline Feedbacks
View all comments