Persistence top

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

>> CHECK OUT THE COURSE

1. Overview

In this tutorial, we'll look at how we can batch insert or update entities using Hibernate/JPA.

Batching allows us to send a group of SQL statements to the database in a single network call. This way, we can optimize the network and memory usage of our application.

2. Setup

2.1. Sample Data Model

Let's look at our sample data model that we'll use in the examples.

Firstly, we'll create a School entity:

@Entity
public class School {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private long id;

    private String name;

    @OneToMany(mappedBy = "school")
    private List<Student> students;

    // Getters and setters...
}

Each School will have zero or more Students:

@Entity
public class Student {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE)
    private long id;

    private String name;

    @ManyToOne
    private School school;

    // Getters and setters...
}

2.2. Tracing SQL Queries

When running our examples, we'll need to verify that insert/update statements are indeed sent in batches. Unfortunately, we can't understand from Hibernate log statements whether SQL statements are batched or not. Because of this, we'll be using a data source proxy to trace Hibernate/JPA SQL statements:

private static class ProxyDataSourceInterceptor implements MethodInterceptor {
    private final DataSource dataSource;
    public ProxyDataSourceInterceptor(final DataSource dataSource) {
        this.dataSource = ProxyDataSourceBuilder.create(dataSource)
            .name("Batch-Insert-Logger")
            .asJson().countQuery().logQueryToSysOut().build();
    }
    
    // Other methods...
}

3. Default Behaviour

Hibernate doesn't enable batching by default. This means that it'll send a separate SQL statement for each insert/update operation:

@Transactional
@Test
public void whenNotConfigured_ThenSendsInsertsSeparately() {
    for (int i = 0; i < 10; i++) {
        School school = createSchool(i);
        entityManager.persist(school);
    }
    entityManager.flush();
}

Here, we've persisted 10 School entities. If we look at the query logs, we can see that Hibernate sends each insert statement separately:

"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School1","1"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School2","2"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School3","3"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School4","4"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School5","5"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School6","6"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School7","7"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School8","8"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School9","9"]]
"querySize":1, "batchSize":0, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School10","10"]]

Hence we should configure Hibernate to enable batching. For this purpose, we should set hibernate.jdbc.batch_size property to a number bigger than 0.

If we're creating EntityManager manually, we should add hibernate.jdbc.batch_size to the Hibernate properties:

public Properties hibernateProperties() {
    Properties properties = new Properties();
    properties.put("hibernate.jdbc.batch_size", "5");
    
    // Other properties...
    return properties;
}

If we're using Spring Boot, we can define it as an application property:

spring.jpa.properties.hibernate.jdbc.batch_size=5

4. Batch Insert for Single Table

4.1. Batch Insert without Explicit Flush

Let's first look at how we can use batch inserts when we're dealing with only one entity type.

We'll use the previous code sample, but this time batching is enabled:

@Transactional
@Test
public void whenInsertingSingleTypeOfEntity_thenCreatesSingleBatch() {
    for (int i = 0; i < 10; i++) {
        School school = createSchool(i);
        entityManager.persist(school);
    }
}

Here we've persisted 10 School entities. When we look at the logs, we can verify that Hibernate sends insert statements in batches:

"batch":true, "querySize":1, "batchSize":5, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School1","1"],["School2","2"],["School3","3"],["School4","4"],["School5","5"]]
"batch":true, "querySize":1, "batchSize":5, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School6","6"],["School7","7"],["School8","8"],["School9","9"],["School10","10"]]

One important thing to mention here is the memory consumption. When we persist an entity, Hibernate stores it in the persistence context. For example, if we persist 100,000 entities in one transaction, we'll end up having 100,000 entity instances in memory, possibly causing an OutOfMemoryException.

4.2. Batch Insert with Explicit Flush

Now, we'll look at how we can optimize memory usage during batching operations. Let's dig deep into the persistence context's role.

First of all, the persistence context stores newly created entities and also the modified ones in memory. Hibernate sends these changes to the database when the transaction is synchronized. This generally happens at the end of a transaction. However, calling EntityManager.flush() also triggers a transaction synchronization.

Secondly, the persistence context serves as an entity cache, thus also referred to as the first level cache. To clear entities in the persistence context, we can call EntityManager.clear().

So, to reduce the memory load during batching, we can call EntityManager.flush() and EntityManager.clear() on our application code, whenever batch size is reached:

@Transactional
@Test
public void whenFlushingAfterBatch_ThenClearsMemory() {
    for (int i = 0; i < 10; i++) {
        if (i > 0 && i % BATCH_SIZE == 0) {
            entityManager.flush();
            entityManager.clear();
        }
        School school = createSchool(i);
        entityManager.persist(school);
    }
}

Here we're flushing the entities in the persistence context thus making Hibernate send queries to the database. Furthermore, by clearing the persistence context, we're removing the School entities from memory. Batching behavior will remain the same.

5. Batch Insert for Multiple Tables

Now let's see how we can configure batch inserts when dealing with multiple entity types in one transaction.

When we want to persist the entities of several types, Hibernate creates a different batch for each entity type. This is because there can be only one type of entity in a single batch.

Additionally, as Hibernate collects insert statements, whenever it encounters an entity type different from the one in the current batch, it creates a new batch. This is the case even though there is already a batch for that entity type:

@Transactional
@Test
public void whenThereAreMultipleEntities_ThenCreatesNewBatch() {
    for (int i = 0; i < 10; i++) {
        if (i > 0 && i % BATCH_SIZE == 0) {
            entityManager.flush();
            entityManager.clear();
        }
        School school = createSchool(i);
        entityManager.persist(school);
        Student firstStudent = createStudent(school);
        Student secondStudent = createStudent(school);
        entityManager.persist(firstStudent);
        entityManager.persist(secondStudent);
    }
}

Here, we're inserting a School and assigning it two Students and repeating this process 10 times.

In the logs, we see that Hibernate sends School insert statements in several batches of size 1 while we were expecting only 2 batches of size 5. Moreover, Student insert statements are also sent in several batches of size 2 instead of 4 batches of size 5:

"batch":true, "querySize":1, "batchSize":1, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School1","1"]]
"batch":true, "querySize":1, "batchSize":2, "query":["insert into student (name, school_id, id) 
  values (?, ?, ?)"], "params":[["Student-School1","1","2"],["Student-School1","1","3"]]
"batch":true, "querySize":1, "batchSize":1, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School2","4"]]
"batch":true, "querySize":1, "batchSize":2, "query":["insert into student (name, school_id, id) 
  values (?, ?, ?)"], "params":[["Student-School2","4","5"],["Student-School2","4","6"]]
"batch":true, "querySize":1, "batchSize":1, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School3","7"]]
"batch":true, "querySize":1, "batchSize":2, "query":["insert into student (name, school_id, id) 
  values (?, ?, ?)"], "params":[["Student-School3","7","8"],["Student-School3","7","9"]]
Other log lines...

To batch all insert statements of the same entity type, we should configure the hibernate.order_inserts property.

We can configure the Hibernate property manually using EntityManagerFactory:

public Properties hibernateProperties() {
    Properties properties = new Properties();
    properties.put("hibernate.order_inserts", "true");
    
    // Other properties...
    return properties;
}

If we're using Spring Boot, we can configure the property in application.properties:

spring.jpa.properties.hibernate.order_inserts=true

After adding this property, we'll have 1 batch for School inserts and 2 batches for Student inserts:

"batch":true, "querySize":1, "batchSize":5, "query":["insert into school (name, id) values (?, ?)"], 
  "params":[["School6","16"],["School7","19"],["School8","22"],["School9","25"],["School10","28"]]
"batch":true, "querySize":1, "batchSize":5, "query":["insert into student (name, school_id, id) 
  values (?, ?, ?)"], "params":[["Student-School6","16","17"],["Student-School6","16","18"],
  ["Student-School7","19","20"],["Student-School7","19","21"],["Student-School8","22","23"]]
"batch":true, "querySize":1, "batchSize":5, "query":["insert into student (name, school_id, id) 
  values (?, ?, ?)"], "params":[["Student-School8","22","24"],["Student-School9","25","26"],
  ["Student-School9","25","27"],["Student-School10","28","29"],["Student-School10","28","30"]]

6. Batch Update

Now, let's move on to batch updates. Similar to batch inserts, we can group several update statements and send them to the database in one go.

To enable this, we'll configure hibernate.order_updates and hibernate.jdbc.batch_versioned_data properties.

If we're creating our EntityManagerFactory manually, we can set the properties programmatically:

public Properties hibernateProperties() {
    Properties properties = new Properties();
    properties.put("hibernate.order_updates", "true");
    properties.put("hibernate.batch_versioned_data", "true");
    
    // Other properties...
    return properties;
}

And if we're using Spring Boot, we'll just add them to application.properties:

spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.batch_versioned_data=true

After configuring these properties, Hibernate should group update statements in batches:

@Transactional
@Test
public void whenUpdatingEntities_thenCreatesBatch() {
    TypedQuery<School> schoolQuery = 
      entityManager.createQuery("SELECT s from School s", School.class);
    List<School> allSchools = schoolQuery.getResultList();
    for (School school : allSchools) {
        school.setName("Updated_" + school.getName());
    }
}

Here we've updated school entities and Hibernate sends SQL statements in 2 batches of size 5:

"batch":true, "querySize":1, "batchSize":5, "query":["update school set name=? where id=?"], 
  "params":[["Updated_School1","1"],["Updated_School2","2"],["Updated_School3","3"],
  ["Updated_School4","4"],["Updated_School5","5"]]
"batch":true, "querySize":1, "batchSize":5, "query":["update school set name=? where id=?"], 
  "params":[["Updated_School6","6"],["Updated_School7","7"],["Updated_School8","8"],
  ["Updated_School9","9"],["Updated_School10","10"]]

7. @Id Generation Strategy

When we want to use batching for inserts/updates, we should be aware of the primary key generation strategy. If our entities use GenerationType.IDENTITY identifier generator, Hibernate will silently disable batch inserts/updates.

Since entities in our examples use GenerationType.SEQUENCE identifier generator, Hibernate enables batch operations:

@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE)
private long id;

8. Summary

In this article, we looked at batch inserts and updates using Hibernate/JPA.

Check out the code samples for this article over on Github.

Persistence bottom

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

>> CHECK OUT THE COURSE
Comments are closed on this article!