Persistence top

Get started with Spring Data JPA through the reference Learn Spring Data JPA course:

>> CHECK OUT THE COURSE

1. Overview

In this article, we'll discuss distinct HQL queries and how to avoid adding the distinct keyword in the SQL queries when it's not necessary.

2. Understanding the Problem

First, let's look at our data model and identify what we're trying to accomplish.

We'll use the Post and Comment entity objects, which share a one-to-many relationship. We want to retrieve a list of posts with all their associated comments.

Let's start by trying out the following HQL query:

String hql = "SELECT p FROM Post p LEFT JOIN FETCH p.comments";
List<Post> posts = session.createQuery(hql, Post.class).getResultList();

This will generate the following SQL query:

select
    post0_.id as id1_3_0_,
    comment2_.id as id1_0_1_,
    post0_.title as title2_3_0_,
    comment2_.text as text2_0_1_,
    comments1_.Post_id as Post_id1_4_0__,
    comments1_.comments_id as comments2_4_0__
from
    Post post0_
left outer join
    Post_Comment comments1_
        on post0_.id=comments1_.Post_id
left outer join
    Comment comment2_
        on comments1_.comments_id=comment2_.id

The result will contain duplicates. A Post will be shown as many times as there are associated Commentsa Post with three Comments will show up in the result list three times.

3. Using distinct in the HQL Query

We'll need to use the distinct keyword in our HQL query to eliminate the duplicates:

String hql = "SELECT DISTINCT p FROM Post p LEFT JOIN FETCH p.comments";
List<Post> posts = session.createQuery(hql, Post.class).getResultList();

Now, we're getting the correct result: There are no more duplicated Post objects. Let's take a look at the SQL statement generated by Hibernate:

select
    distinct post0_.id as id1_3_0_,
    comment2_.id as id1_0_1_,
    post0_.title as title2_3_0_,
    comment2_.text as text2_0_1_,
    comments1_.Post_id as Post_id1_4_0__,
    comments1_.comments_id as comments2_4_0__
from
    Post post0_
left outer join
    Post_Comment comments1_
        on post0_.id=comments1_.Post_id
left outer join
    Comment comment2_
        on comments1_.comments_id=comment2_.id

We can notice that the distinct keyword was not only used by Hibernate but also included in the SQL query. We should avoid this because it's unnecessary and will cause performance issues.

4. Using QueryHint to Stop Passing the distinct Keyword

Starting with Hibernate 5.2, we can leverage the pass-distinct-through mechanism to no longer pass the SQL statement's HQL/JPQL distinct clause.

To disable pass-distinct-through, we need to add to the query the hint QueryHint.PASS_DISTINCT_THROUGH, with value false:

String hql = "SELECT DISTINCT p FROM Post p LEFT JOIN FETCH p.comments";
List<Post> posts = session.createQuery(hql, Post.class)
  .setHint(QueryHints.PASS_DISTINCT_THROUGH, false)
  .getResultList();

If we check the result, we'll see there are no duplicated entities. Furthermore, the distinct clause wasn't used in the SQL statement:

select
    post0_.id as id1_3_0_,
    comment2_.id as id1_0_1_,
    post0_.title as title2_3_0_,
    comment2_.text as text2_0_1_,
    comments1_.Post_id as Post_id1_4_0__,
    comments1_.comments_id as comments2_4_0__ 
from
    Post post0_ 
left outer join
    Post_Comment comments1_ 
        on post0_.id=comments1_.Post_id 
left outer join
    Comment comment2_ 
        on comments1_.comments_id=comment2_.id

5. Conclusion

In this article, we've discovered that the presence of the distinct keyword in the SQL query can be unnecessary, and it has performance implications. After that, we've learned how to use the PASS_DISTINCT_THROUGH query hint to avoid this behavior.

As always, the source code is available over on GitHub.

Persistence bottom
Get started with Spring Data JPA through the reference Learn Spring Data JPA course: >> CHECK OUT THE COURSE
Persistence footer banner
Comments are closed on this article!