Simple Full-Text Search with ElasticSearch

Azure Spring Apps is a fully managed service from Microsoft (built in collaboration with VMware), focused on building and deploying Spring Boot applications on Azure Cloud without worrying about Kubernetes.

And, the Enterprise plan comes with some interesting features, such as commercial Spring runtime support, a 99.95% SLA and some deep discounts (up to 47%) when you are ready for production.

>> Learn more and deploy your first Spring Boot app to Azure.

You can also ask questions and leave feedback on the Azure Spring Apps GitHub page.

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

The Jet Profiler was built for MySQL only, so it can do things like real-time query performance, focus on most used tables or most frequent queries, quickly identify performance issues and basically help you optimize your queries.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

Accelerate Your Jakarta EE Development with Payara Server!

With best-in-class guides and documentation, Payara essentially simplifies deployment to diverse infrastructures.

Beyond that, it provides intelligent insights and actions to optimize Jakarta EE applications.

The goal is to apply an opinionated approach to get to what's essential for mission-critical applications - really solid scalability, availability, security, and long-term support:

>> Download and Explore the Guide (to learn more)

The AI Assistant to boost Boost your productivity writing unit tests - Machinet AI.

AI is all the rage these days, but for very good reason. The highly practical coding companion, you'll get the power of AI-assisted coding and automated unit test generation.
Machinet's Unit Test AI Agent utilizes your own project context to create meaningful unit tests that intelligently aligns with the behavior of the code.
And, the AI Chat crafts code and fixes errors with ease, like a helpful sidekick.

Simplify Your Coding Journey with Machinet AI:

>> Install Machinet AI in your IntelliJ

Looking for the ideal Linux distro for running modern Spring apps in the cloud?

Meet Alpaquita Linux: lightweight, secure, and powerful enough to handle heavy workloads.

This distro is specifically designed for running Java apps. It builds upon Alpine and features significant enhancements to excel in high-density container environments while meeting enterprise-grade security standards.

Specifically, the container image size is ~30% smaller than standard options, and it consumes up to 30% less RAM:

>> Try Alpaquita Containers now.

DbSchema is a super-flexible database designer, which can take you from designing the DB with your team all the way to safely deploying the schema.

The way it does all of that is by using a design model, a database-independent image of the schema, which can be shared in a team using GIT and compared or deployed on to any database.

And, of course, it can be heavily visual, allowing you to interact with the database using diagrams, visually compose queries, explore the data, generate random data, import data or build HTML5 database reports.

>> Take a look at DBSchema

Slow MySQL query performance is all too common. Of course it is. A good way to go is, naturally, a dedicated profiler that actually understands the ins and outs of MySQL.

Critically, it has very minimal impact on your server's performance, with most of the profiling work done separately - so it needs no server changes, agents or separate services.

Basically, you install the desktop application, connect to your MySQL server, hit the record button, and you'll have results within minutes:

>> Try out the Profiler

1. Overview

Full-text search queries and performs linguistic searches against documents. It includes single or multiple words or phrases and returns documents that match search condition.

ElasticSearch is a search engine based on Apache Lucene, a free and open-source information retrieval software library. It provides a distributed, full-text search engine with an HTTP web interface and schema-free JSON documents.

This article examines ElasticSearch REST API and demonstrates basic operations using HTTP requests only.

2. Setup

In order to install ElasticSearch on your machine, please refer to the official setup guide.

RESTfull API runs on port 9200. Let us test if it is running properly using the following curl command:

curl -XGET 'http://localhost:9200/'

If you observe the following response, the instance is properly running:

{
  "name": "NaIlQWU",
  "cluster_name": "elasticsearch",
  "cluster_uuid": "enkBkWqqQrS0vp_NXmjQMQ",
  "version": {
    "number": "5.1.2",
    "build_hash": "c8c4c16",
    "build_date": "2017-01-11T20:18:39.146Z",
    "build_snapshot": false,
    "lucene_version": "6.3.0"
  },
  "tagline": "You Know, for Search"
}

3. Indexing Documents

ElasticSearch is document oriented. It stores and indexes documents. Indexing creates or updates documents. After indexing, you can search, sort, and filter complete documents—not rows of columnar data. This is a fundamentally different way of thinking about data and is one of the reasons ElasticSearch can perform a complex full-text search.

Documents are represented as JSON objects. JSON serialization is supported by most programming languages and has become the standard format used by the NoSQL movement. It is simple, concise, and easy to read.

We are going to use the following random entries to perform our full-text search:

{
  "title": "He went",
  "random_text": "He went such dare good fact. The small own seven saved man age."
}

{
  "title": "He oppose",
  "random_text": 
    "He oppose at thrown desire of no. \
      Announcing impression unaffected day his are unreserved indulgence."
}

{
  "title": "Repulsive questions",
  "random_text": "Repulsive questions contented him few extensive supported."
}

{
  "title": "Old education",
  "random_text": "Old education him departure any arranging one prevailed."
}

Before we can index a document, we need to decide where to store it. It’s possible to have multiple indexes, which in turn contain multiple types. These types hold multiple documents, and each document has multiple fields.

We are going to store our documents using the following scheme:

text: The index name.
article: The type name.
id: The ID of this particular example text-entry.

To add a document we are going to run the following command:

curl -XPUT 'localhost:9200/text/article/1?pretty'
  -H 'Content-Type: application/json' -d '
{
  "title": "He went",
  "random_text": 
    "He went such dare good fact. The small own seven saved man age."
}'

Here we are using id=1, we can add other entries using the same command and incremented id.

4. Retrieving Documents

After we add all our documents we can check how many documents, using the following command, we have in the cluster :

curl -XGET 'http://localhost:9200/_count?pretty' -d '
{
  "query": {
    "match_all": {}
  }
}'

Also, we can get a document using its id with the following command:

curl -XGET 'localhost:9200/text/article/1?pretty'

And we should get the following answer from elastic search:

{
  "_index": "text",
  "_type": "article",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "title": "He went",
    "random_text": 
      "He went such dare good fact. The small own seven saved man age."
  }
}

As we can see this answer correspond with the entry added using the id 1.

5. Querying Documents

OK let’s perform a full-text search with the following command:

curl -XGET 'localhost:9200/text/article/_search?pretty' 
  -H 'Content-Type: application/json' -d '
{
  "query": {
    "match": {
      "random_text": "him departure"
    }
  }
}'

And we get the following result:

{
  "took": 32,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.4513469,
    "hits": [
      {
        "_index": "text",
        "_type": "article",
        "_id": "4",
        "_score": 1.4513469,
        "_source": {
          "title": "Old education",
          "random_text": "Old education him departure any arranging one prevailed."
        }
      },
      {
        "_index": "text",
        "_type": "article",
        "_id": "3",
        "_score": 0.28582606,
        "_source": {
          "title": "Repulsive questions",
          "random_text": "Repulsive questions contented him few extensive supported."
        }
      }
    ]
  }
}

As we can see we are looking for “him departure” and we get two results with different scores. The first result is obvious because the text have the performed search inside of it and as we can see we have the score of 1.4513469.

The second result is retrieved because the target document contains the word “him”.

By default, ElasticSearch sorts matching results by their relevance score, that is, by how well each document matches the query. Note, that the score of the second result is small relative to the first hit, indicating lower relevance.

6. Fuzzy Search

Fuzzy matching treats two words that are “fuzzily” similar as if they were the same word. First, we need to define what we mean by fuzziness.

Elasticsearch supports a maximum edit distance, specified with the fuzziness parameter, of 2. The fuzziness parameter can be set to AUTO, which results in the following maximum edit distances:

0 for strings of one or two characters
1 for strings of three, four, or five characters
2 for strings of more than five characters

you may find that an edit distance of 2 returns results that don’t appear to be related.

You may get better results, and better performance, with a maximum fuzziness of 1. Distance refers to the Levenshtein distance that is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits.

OK let’s perform our search with fuzziness:

curl -XGET 'localhost:9200/text/article/_search?pretty' -H 'Content-Type: application/json' -d' 
{ 
  "query": 
  { 
    "match": 
    { 
      "random_text": 
      {
        "query": "him departure",
        "fuzziness": "2"
      }
    } 
  } 
}'

And here’s the result:

{
  "took": 88,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 1.5834423,
    "hits": [
      {
        "_index": "text",
        "_type": "article",
        "_id": "4",
        "_score": 1.4513469,
        "_source": {
          "title": "Old education",
          "random_text": "Old education him departure any arranging one prevailed."
        }
      },
      {
        "_index": "text",
        "_type": "article",
        "_id": "2",
        "_score": 0.41093433,
        "_source": {
          "title": "He oppose",
          "random_text":
            "He oppose at thrown desire of no. 
              \ Announcing impression unaffected day his are unreserved indulgence."
        }
      },
      {
        "_index": "text",
        "_type": "article",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "title": "Repulsive questions",
          "random_text": "Repulsive questions contented him few extensive supported."
        }
      },
      {
        "_index": "text",
        "_type": "article",
        "_id": "1",
        "_score": 0.0,
        "_source": {
          "title": "He went",
          "random_text": "He went such dare good fact. The small own seven saved man age."
        }
      }
    ]
  }
}'

As we can see the fuzziness give us more results.

We need to use fuzziness carefully because it tends to retrieve results that look unrelated.

7. Conclusion

In this quick tutorial we focused on indexing documents and querying Elasticsearch for full-text search, directly via it’s REST API.

We, of course, have APIs available for multiple programming languages when we need to – but the API is still quite convenient and language agnostic.

Quick Intro to Full-Text Search with ElasticSearch

**Get started with Spring and Spring Boot, through the reference Learn Spring course:**

1. Overview

2. Setup

3. Indexing Documents

4. Retrieving Documents

5. Querying Documents

6. Fuzzy Search

7. Conclusion

**Get started with Spring and Spring Boot, through the Learn Spring course :**

Get started with Spring Data JPA through the reference Learn Spring Data JPA course:

REST with Spring

Learn Spring Security ▼▲

Learn Spring Security Core

Learn Spring Security OAuth

Learn Spring

Learn Spring Data JPA

Persistence

REST

Security

Full Archive

Baeldung Ebooks

About Baeldung

Write for Baeldung

Get started with Spring and Spring Boot, through the reference Learn Spring course:

1. Overview

2. Setup

3. Indexing Documents

4. Retrieving Documents

5. Querying Documents

6. Fuzzy Search

7. Conclusion

Get started with Spring and Spring Boot, through the Learn Spring course :

Get started with Spring Data JPA through the reference Learn Spring Data JPA course:

**Get started with Spring and Spring Boot, through the reference Learn Spring course:**

**Get started with Spring and Spring Boot, through the Learn Spring course :**