If you have a few years of experience in the Java ecosystem and you'd like to share that with the community, have a look at our Contribution Guidelines.
Distinct Queries in HQL
Last modified: April 12, 2022
1. Overview
In this article, we'll discuss distinct HQL queries and how to avoid adding the distinct keyword in the SQL queries when it's not necessary.
2. Understanding the Problem
First, let's look at our data model and identify what we're trying to accomplish.
We'll use the Post and Comment entity objects, which share a one-to-many relationship. We want to retrieve a list of posts with all their associated comments.
Let's start by trying out the following HQL query:
String hql = "SELECT p FROM Post p LEFT JOIN FETCH p.comments";
List<Post> posts = session.createQuery(hql, Post.class).getResultList();
This will generate the following SQL query:
select
post0_.id as id1_3_0_,
comment2_.id as id1_0_1_,
post0_.title as title2_3_0_,
comment2_.text as text2_0_1_,
comments1_.Post_id as Post_id1_4_0__,
comments1_.comments_id as comments2_4_0__
from
Post post0_
left outer join
Post_Comment comments1_
on post0_.id=comments1_.Post_id
left outer join
Comment comment2_
on comments1_.comments_id=comment2_.id
The result will contain duplicates. A Post will be shown as many times as there are associated Comments – a Post with three Comments will show up in the result list three times.
3. Using distinct in the HQL Query
We'll need to use the distinct keyword in our HQL query to eliminate the duplicates:
String hql = "SELECT DISTINCT p FROM Post p LEFT JOIN FETCH p.comments";
List<Post> posts = session.createQuery(hql, Post.class).getResultList();
Now, we're getting the correct result: There are no more duplicated Post objects. Let's take a look at the SQL statement generated by Hibernate:
select
distinct post0_.id as id1_3_0_,
comment2_.id as id1_0_1_,
post0_.title as title2_3_0_,
comment2_.text as text2_0_1_,
comments1_.Post_id as Post_id1_4_0__,
comments1_.comments_id as comments2_4_0__
from
Post post0_
left outer join
Post_Comment comments1_
on post0_.id=comments1_.Post_id
left outer join
Comment comment2_
on comments1_.comments_id=comment2_.id
We can notice that the distinct keyword was not only used by Hibernate but also included in the SQL query. We should avoid this because it's unnecessary and will cause performance issues.
4. Using QueryHint to Stop Passing the distinct Keyword
Starting with Hibernate 5.2, we can leverage the pass-distinct-through mechanism to no longer pass the SQL statement's HQL/JPQL distinct clause.
To disable pass-distinct-through, we need to add to the query the hint QueryHint.PASS_DISTINCT_THROUGH, with value false:
String hql = "SELECT DISTINCT p FROM Post p LEFT JOIN FETCH p.comments";
List<Post> posts = session.createQuery(hql, Post.class)
.setHint(QueryHints.PASS_DISTINCT_THROUGH, false)
.getResultList();
If we check the result, we'll see there are no duplicated entities. Furthermore, the distinct clause wasn't used in the SQL statement:
select
post0_.id as id1_3_0_,
comment2_.id as id1_0_1_,
post0_.title as title2_3_0_,
comment2_.text as text2_0_1_,
comments1_.Post_id as Post_id1_4_0__,
comments1_.comments_id as comments2_4_0__
from
Post post0_
left outer join
Post_Comment comments1_
on post0_.id=comments1_.Post_id
left outer join
Comment comment2_
on comments1_.comments_id=comment2_.id
5. Conclusion
In this article, we've discovered that the presence of the distinct keyword in the SQL query can be unnecessary, and it has performance implications. After that, we've learned how to use the PASS_DISTINCT_THROUGH query hint to avoid this behavior.
As always, the source code is available over on GitHub.