1. Overview

In this article, we’ll discuss distinct HQL queries and how to avoid adding the distinct keyword in SQL queries when it’s not necessary.

2. Understanding the Problem

First, let’s look at our data model and identify what we’re trying to accomplish.

We’ll use the Post and Comment entity objects, which share a one-to-many relationship. We want to retrieve a list of posts with all their associated comments.

Let’s start by trying out the following HQL query:

String hql = "SELECT p FROM Post p LEFT JOIN FETCH p.comments";
List<Post> posts = session.createQuery(hql, Post.class).getResultList();

This will generate the following SQL query:

select 
     p1_0.id,c1_0.Post_id,c1_1.id,c1_1.text,p1_0.title 
from
     Post p1_0 
left join (Post_Comment c1_0 join Comment c1_1 on c1_1.id=c1_0.comments_id) on p1_0.id=c1_0.Post_id

The result will contain duplicates. A Post will be shown as many times as there are associated Commentsa Post with three Comments will show up in the result list three times.

3. Using distinct in the HQL Query

We’ll need to use the distinct keyword in our HQL query to eliminate the duplicates:

String hql = "SELECT DISTINCT p FROM Post p LEFT JOIN FETCH p.comments";
List<Post> posts = session.createQuery(hql, Post.class).getResultList();

Now, we’re getting the correct result: There are no more duplicated Post objects. Let’s take a look at the SQL statement generated by Hibernate:

select 
     distinct p1_0.id,c1_0.Post_id,c1_1.id,c1_1.text,p1_0.title 
from 
     Post p1_0 
left join 
     (Post_Comment c1_0 join Comment c1_1 on c1_1.id=c1_0.comments_id) on p1_0.id=c1_0.Post_id

We can notice that the distinct keyword was not only used by Hibernate but also included in the SQL query. We should avoid this because it’s unnecessary and will cause performance issues.

4. Using QueryHint for distinct Keyword

Starting with Hibernate 6, distinct is always passed to the SQL query and the flag QueryHints#HINT_PASS_DISTINCT_THROUGH has been removed.

5. Conclusion

In this article, we’ve discovered that the presence of the distinct keyword in the SQL query can be unnecessary and has performance implications. After that, we learned how to use the PASS_DISTINCT_THROUGH query hint to avoid this behaviour.

As always, the source code is available on GitHub.

Course – LSD (cat=Persistence)

Get started with Spring Data JPA through the reference Learn Spring Data JPA course:

>> CHECK OUT THE COURSE
res – Persistence (eBook) (cat=Persistence)
Comments are open for 30 days after publishing a post. For any issues past this date, use the Contact form on the site.