Introduction to Hibernate Search

Last updated: January 8, 2024

Written by: Markus Gulden

Reviewed by: Carsten

NoSQL

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

Regression testing is an important step in the release process, to ensure that new code doesn't break the existing functionality. As the codebase evolves, we want to run these tests frequently to help catch any issues early on.

The best way to ensure these tests run frequently on an automated basis is, of course, to include them in the CI/CD pipeline. This way, the regression tests will execute automatically whenever we commit code to the repository.

In this tutorial, we'll see how to create regression tests using Selenium, and then include them in our pipeline using GitHub Actions:, to be run on the LambdaTest cloud grid:

>> How to Run Selenium Regression Tests With GitHub Actions

1. Overview

In this article, we’ll discuss the basics of Hibernate Search, how to configure it, and we’ll implement some simple queries.

2. Basics of Hibernate Search

Whenever we have to implement full-text search functionality, using tools we’re already well-versed with is always a plus.

In case we’re already using Hibernate and JPA for ORM, we’re only one step away from Hibernate Search.

Hibernate Search integrates Apache Lucene, a high-performance and extensible full-text search-engine library written in Java. This combines the power of Lucene with the simplicity of Hibernate and JPA.

Simply put, we just have to add some additional annotations to our domain classes, and the tool will take care of the things like database/index synchronization.

Hibernate Search also provides an Elasticsearch integration; however, as it’s still in an experimental stage, we’ll focus on Lucene here.

3. Configurations

3.1. Maven Dependencies

Before getting started, we first need to add the necessary dependencies to our pom.xml:

<dependency>
    <groupId>org.hibernate</groupId>
    <artifactId>hibernate-search-orm</artifactId>
    <version>5.8.2.Final</version>
</dependency>

For the sake of simplicity, we’ll use H2 as our database:

<dependency>
    <groupId>com.h2database</groupId> 
    <artifactId>h2</artifactId>
    <version>1.4.196</version>
</dependency>

3.2. Configurations

We also have to specify where Lucene should store the index.

This can be done via the property hibernate.search.default.directory_provider.

We’ll choose filesystem, which is the most straightforward option for our use case. More options are listed in the official documentation. Filesystem-master/filesystem-slave and infinispan are noteworthy for clustered applications, where the index has to be synchronized between nodes.

We also have to define a default base directory where indexes will be stored:

hibernate.search.default.directory_provider = filesystem
hibernate.search.default.indexBase = /data/index/default

4. The Model Classes

After the configuration, we’re now ready to specify our model.

On top of the JPA annotations @Entity and @Table, we have to add an @Indexed annotation. It tells Hibernate Search that the entity Product shall be indexed.

After that, we have to define the required attributes as searchable by adding a @Field annotation:

@Entity
@Indexed
@Table(name = "product")
public class Product {

    @Id
    private int id;

    @Field(termVector = TermVector.YES)
    private String productName;

    @Field(termVector = TermVector.YES)
    private String description;

    @Field
    private int memory;

    // getters, setters, and constructors
}

The termVector = TermVector.YES attribute will be required for the “More Like This” query later.

5. Building the Lucene Index

Before starting the actual queries, we have to trigger Lucene to build the index initially:

FullTextEntityManager fullTextEntityManager 
  = Search.getFullTextEntityManager(entityManager);
fullTextEntityManager.createIndexer().startAndWait();

After this initial build, Hibernate Search will take care of keeping the index up to date. I. e. we can create, manipulate and delete entities via the EntityManager as usual.

Note: we have to make sure that entities are fully committed to the database before they can be discovered and indexed by Lucene (by the way, this also the reason why the initial test data import in our example code test cases comes in a dedicated JUnit test case, annotated with @Commit).

6. Building and Executing Queries

Now, we’re ready for creating our first query.

In the following section, we’ll show the general workflow for preparing and executing a query.

After that, we’ll create some example queries for the most important query types.

6.1. General Workflow for Creating and Executing a Query

Preparing and executing a query in general consists of four steps:

In step 1, we have to get a JPA FullTextEntityManager and from that a QueryBuilder:

FullTextEntityManager fullTextEntityManager 
  = Search.getFullTextEntityManager(entityManager);

QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory() 
  .buildQueryBuilder()
  .forEntity(Product.class)
  .get();

In step 2, we will create a Lucene query via the Hibernate query DSL:

org.apache.lucene.search.Query query = queryBuilder
  .keyword()
  .onField("productName")
  .matching("iphone")
  .createQuery();

In step 3, we’ll wrap the Lucene query into a Hibernate query:

org.hibernate.search.jpa.FullTextQuery jpaQuery
  = fullTextEntityManager.createFullTextQuery(query, Product.class);

Finally, in step 4 we’ll execute the query:

List<Product> results = jpaQuery.getResultList();

Note: by default, Lucene sorts the results by relevance.

Steps 1, 3 and 4 are the same for all query types.

In the following, we will focus on step 2, i. e. how to create different types of queries.

6.2. Keyword Queries

The most basic use-case is searching for a specific word.

This is what we actually did already in the previous section:

Query keywordQuery = queryBuilder
  .keyword()
  .onField("productName")
  .matching("iphone")
  .createQuery();

Here, keyword() specifies that we are looking for one specific word, onField() tells Lucene where to look and matching() what to look for.

6.3. Fuzzy Queries

Fuzzy queries are working like keyword queries, except that we can define a limit of “fuzziness”, above which Lucene shall accept the two terms as matching.

By withEditDistanceUpTo(), we can define how much a term may deviate from the other. It can be set to 0, 1, and 2, whereby the default value is 2 (note: this limitation is coming from the Lucene’s implementation).

By withPrefixLength(), we can define the length of the prefix which shall be ignored by the fuzziness:

Query fuzzyQuery = queryBuilder
  .keyword()
  .fuzzy()
  .withEditDistanceUpTo(2)
  .withPrefixLength(0)
  .onField("productName")
  .matching("iPhaen")
  .createQuery();

6.4. Wildcard Queries

Hibernate Search also enables us to execute wildcard queries, i. e. queries for which a part of a word is unknown.

For this, we can use “?” for a single character, and “*” for any character sequence:

Query wildcardQuery = queryBuilder
  .keyword()
  .wildcard()
  .onField("productName")
  .matching("Z*")
  .createQuery();

6.5. Phrase Queries

If we want to search for more than one word, we can use phrase queries. We can either look for exact or for approximate sentences, using phrase() and withSlop(), if necessary. The slop factor defines the number of other words permitted in the sentence:

Query phraseQuery = queryBuilder
  .phrase()
  .withSlop(1)
  .onField("description")
  .sentence("with wireless charging")
  .createQuery();

6.6. Simple Query String Queries

With the previous query types, we had to specify the query type explicitly.

If we want to give some more power to the user, we can use simple query string queries: by that, he can define his own queries at runtime.

The following query types are supported:

boolean (AND using “+”, OR using “|”, NOT using “-“)
prefix (prefix*)
phrase (“some phrase”)
precedence (using parentheses)
fuzzy (fuzy~2)
near operator for phrase queries (“some phrase”~3)

The following example would combine fuzzy, phrase and boolean queries:

Query simpleQueryStringQuery = queryBuilder
  .simpleQueryString()
  .onFields("productName", "description")
  .matching("Aple~2 + \"iPhone X\" + (256 | 128)")
  .createQuery();

6.7. Range Queries

Range queries search for a value in between given boundaries. This can be applied to numbers, dates, timestamps, and strings:

Query rangeQuery = queryBuilder
  .range()
  .onField("memory")
  .from(64).to(256)
  .createQuery();

6.8. More Like This Queries

Our last query type is the “More Like This” – query. For this, we provide an entity, and Hibernate Search returns a list with similar entities, each with a similarity score.

As mentioned before, the termVector = TermVector.YES attribute in our model class is required for this case: it tells Lucene to store the frequency for each term during indexing.

Based on this, the similarity will be calculated at query execution time:

Query moreLikeThisQuery = queryBuilder
  .moreLikeThis()
  .comparingField("productName").boostedTo(10f)
  .andField("description").boostedTo(1f)
  .toEntity(entity)
  .createQuery();
List<Object[]> results = (List<Object[]>) fullTextEntityManager
  .createFullTextQuery(moreLikeThisQuery, Product.class)
  .setProjection(ProjectionConstants.THIS, ProjectionConstants.SCORE)
  .getResultList();

6.9. Searching More Than One Field

Until now, we only created queries for searching one attribute, using onField().

Depending on the use case, we can also search two or more attributes:

Query luceneQuery = queryBuilder
  .keyword()
  .onFields("productName", "description")
  .matching(text)
  .createQuery();

Moreover, we can specify each attribute to be searched separately, e. g. if we want to define a boost for one attribute:

Query moreLikeThisQuery = queryBuilder
  .moreLikeThis()
  .comparingField("productName").boostedTo(10f)
  .andField("description").boostedTo(1f)
  .toEntity(entity)
  .createQuery();

6.10. Combining Queries

Finally, Hibernate Search also supports combining queries using various strategies:

SHOULD: the query should contain the matching elements of the subquery
MUST: the query must contain the matching elements of the subquery
MUST NOT: the query must not contain the matching elements of the subquery

The aggregations are similar to the boolean ones AND, OR and NOT. However, the names are different to emphasize that they also have an impact on the relevance.

For example, a SHOULD between two queries is similar to boolean OR: if one of the two queries has a match, this match will be returned.

However, if both queries match, the match will have a higher relevance compared to if only one query matches:

Query combinedQuery = queryBuilder
  .bool()
  .must(queryBuilder.keyword()
    .onField("productName").matching("apple")
    .createQuery())
  .must(queryBuilder.range()
    .onField("memory").from(64).to(256)
    .createQuery())
  .should(queryBuilder.phrase()
    .onField("description").sentence("face id")
    .createQuery())
  .must(queryBuilder.keyword()
    .onField("productName").matching("samsung")
    .createQuery())
  .not()
  .createQuery();

7. Conclusion

In this article, we discussed the basics of Hibernate Search and showed how to implement the most important query types. More advanced topics can be found it the official documentation.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.

REST with Spring Boot

Learn Spring Security

Learn Spring

Learn Spring Data JPA

View All Courses

Learn JUnit

Learn Maven

Learn Hibernate JPA

Learn Mockito

Learn Java Streams and Lambdas

Learn Java Basics

Full Archive

Baeldung Ebooks

About Baeldung