Persistence top

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

>> CHECK OUT THE COURSE

1. Overview

In this article, we're going to dive into some key concepts related to full-text search engines, with a special focus on Elasticsearch.

As this is a Java-oriented article, we're not going to give a detailed step-by-step tutorial on how to setup Elasticsearch and show how it works under the hood. Instead, we're going to target the Java client, and how to use the main features like index, delete, get and search.

2. Setup

For the sake of simplicity, we'll use a docker image for our Elasticsearch instance, though any Elasticsearch instance listening on port 9200 will do.

We start by firing up our Elasticsearch instance:

docker run -d --name es762 -p 9200:9200 -e "discovery.type=single-node" elasticsearch:7.6.2

By default, Elasticsearch listens on the 9200 port for upcoming HTTP queries. We can verify that it is successfully launched by opening the http://localhost:9200/ URL in your favorite browser:

{
  "name" : "M4ojISw",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "CNnjvDZzRqeVP-B04D3CmA",
  "version" : {
    "number" : "7.6.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "2f4c224",
    "build_date" : "2020-03-18T23:22:18.622755Z",
    "build_snapshot" : false,
    "lucene_version" : "8.4.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.8.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

3. Maven Configuration

Now that we have our basic Elasticsearch cluster up and running, let's jump straight to the Java client. First of all, we need to have the following Maven dependency declared in our pom.xml file:

<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>7.6.2</version>
</dependency>

You can always check the latest versions hosted by the Maven Central with the link provided before.

4. Java API

Before we jump straight to how to use the main Java API features, we need to initiate the RestHighLevelClient:

ClientConfiguration clientConfiguration =
    ClientConfiguration.builder().connectedTo("localhost:9200").build();
RestHighLevelClient client = RestClients.create(clientConfiguration).rest();

4.1. Indexing Documents

The index() function allows to store an arbitrary JSON document and make it searchable:

@Test
public void givenJsonString_whenJavaObject_thenIndexDocument() {
  String jsonObject = "{\"age\":10,\"dateOfBirth\":1471466076564,"
    +"\"fullName\":\"John Doe\"}";
  IndexRequest request = new IndexRequest("people");
  request.source(jsonObject, XContentType.JSON);
  
  IndexResponse response = client.index(request, RequestOptions.DEFAULT);
  String index = response.getIndex();
  long version = response.getVersion();
    
  assertEquals(Result.CREATED, response.getResult());
  assertEquals(1, version);
  assertEquals("people", index);
}

Note that it is possible to use any JSON Java library to create and process your documents. If you are not familiar with any of these, you can use Elasticsearch helpers to generate your own JSON documents:

XContentBuilder builder = XContentFactory.jsonBuilder()
  .startObject()
  .field("fullName", "Test")
  .field("dateOfBirth", new Date())
  .field("age", "10")
  .endObject();

  IndexRequest indexRequest = new IndexRequest("people");
  indexRequest.source(builder);

  IndexResponse response = client.index(indexRequest, RequestOptions.DEFAULT);
  assertEquals(Result.CREATED, response.getResult());

4.2. Querying Indexed Documents

Now that we have a typed searchable JSON document indexed, we can proceed and search using the search() method:

SearchRequest searchRequest = new SearchRequest();
SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] searchHits = response.getHits().getHits();
List<Person> results = 
  Arrays.stream(searchHits)
    .map(hit -> JSON.parseObject(hit.getSourceAsString(), Person.class))
    .collect(Collectors.toList());

The results returned by the search() method are called Hits, each Hit refers to a JSON document matching a search request.

In this case, the results list contains all the data stored in the cluster. Note that in this example we're using the FastJson library in order to convert JSON Strings to Java objects.

We can enhance the request by adding additional parameters in order to customize the query using the QueryBuilders methods:

SearchSourceBuilder builder = new SearchSourceBuilder()
  .postFilter(QueryBuilders.rangeQuery("age").from(5).to(15));

SearchRequest searchRequest = new SearchRequest();
searchRequest.searchType(SearchType.DFS_QUERY_THEN_FETCH);
searchRequest.source(builder);

SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);

4.3. Retrieving and Deleting Documents

The get() and delete() methods allow to get or delete a JSON document from the cluster using its id:

GetRequest getRequest = new GetRequest("people");
getRequest.id(id);

GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);
// process fields
    
DeleteRequest deleteRequest = new DeleteRequest("people");
deleteRequest.id(id);

DeleteResponse deleteResponse = client.delete(deleteRequest, RequestOptions.DEFAULT);

The syntax is pretty straightforward, you just need to specify the index alongside the object's id.

5. QueryBuilders Examples

The QueryBuilders class provides a variety of static methods used as dynamic matchers to find specific entries in the cluster. While using the search() method to look for specific JSON documents in the cluster, we can use query builders to customize the search results.

Here's a list of the most common uses of the QueryBuilders API.

The matchAllQuery() method returns a QueryBuilder object that matches all documents in the cluster:

QueryBuilder matchAllQuery = QueryBuilders.matchAllQuery();

The rangeQuery() matches documents where a field's value is within a certain range:

QueryBuilder matchDocumentsWithinRange = QueryBuilders
  .rangeQuery("price").from(15).to(100)

Providing a field name – e.g. fullName, and the corresponding value – e.g. John Doe, The matchQuery() method matches all document with these exact field's value:

QueryBuilder matchSpecificFieldQuery= QueryBuilders
  .matchQuery("fullName", "John Doe");

We can as well use the multiMatchQuery() method to build a multi-fields version of the match query:

QueryBuilder matchSpecificFieldQuery= QueryBuilders.matchQuery(
  "Text I am looking for", "field_1", "field_2^3", "*_field_wildcard");

We can use the caret symbol (^) to boost specific fields.

In our example the field_2 has boost value set to three, making it more important than the other fields. Note that it's possible to use wildcards and regex queries, but performance-wise, beware of memory consumption and response-time delay when dealing with wildcards, because something like *_apples may cause a huge impact on performance.

The coefficient of importance is used to order the result set of hits returned after executing the search() method.

If you are more familiar with the Lucene queries syntax, you can use the simpleQueryStringQuery() method to customize search queries:

QueryBuilder simpleStringQuery = QueryBuilders
  .simpleQueryStringQuery("+John -Doe OR Janette");

As you can probably guess, we can use the Lucene's Query Parser syntax to build simple, yet powerful queries. Here're some basic operators that can be used alongside the AND/OR/NOT operators to build search queries:

  • The required operator (+): requires that a specific piece of text exists somewhere in fields of a document.
  • The prohibit operator (): excludes all documents that contain a keyword declared after the () symbol.

6. Conclusion

In this quick article, we've seen how to use the ElasticSearch's Java API to perform some of the common features related to full-text search engines.

You can check out the example provided in this article in the GitHub project.

Persistence bottom

I just announced the new Learn Spring course, focused on the fundamentals of Spring 5 and Spring Boot 2:

>> CHECK OUT THE COURSE
2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Sahar Anajjar
Sahar Anajjar
3 years ago

Thank you for ur interesting tutorial.
Please im having in issue with path.home varibale ? how can i fix it ?

Grzegorz Piwowarek
Grzegorz Piwowarek
3 years ago
Reply to  Sahar Anajjar

Sahar, what is the issue exactly?

Comments are closed on this article!