Elasticsearch Query With “Not Contains”

Last updated: September 24, 2025

Written by: Kostiantyn Ivanov

Reviewed by: David Martinez

Data

Elasticsearch

Refactor Java code safely — and automatically — with OpenRewrite.

Refactoring big codebases by hand is slow, risky, and easy to put off. That’s where OpenRewrite comes in. The open-source framework for large-scale, automated code transformations helps teams modernize safely and consistently.

Each month, the creators and maintainers of OpenRewrite at Moderne run live, hands-on training sessions — one for newcomers and one for experienced users. You’ll see how recipes work, how to apply them across projects, and how to modernize code with confidence.

Join the next session, bring your questions, and learn how to automate the kind of work that usually eats your sprint time.

1. Overview

When working with Elasticsearch, we often need to filter out documents that do not contain a specific substring in a field. Elasticsearch doesn’t have a direct ‘not contains’ operator, but we can use several approaches to achieve this behavior. In this article, we’ll explore various methods to achieve the not contains behavior.

2. Index Setup

Before we start, let’s run an Elasticsearch instance as we usually do. Next, let’s create the index to store our transaction logs:

curl -X PUT "http://localhost:9200/transaction-logs" -H "Content-Type: application/json" -d'
{
  "mappings": {
    "properties": {
      "message": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }    
    }
  }
}'

Finally, let’s prepare a few documents with user transactions:

curl -X POST "http://localhost:9200/transaction-logs/_doc/1" 
-H "Content-Type: application/json" 
-d' { "message": "User1 deposited 1000 AP1 points" }' 

curl -X POST "http://localhost:9200/transaction-logs/_doc/2" 
-H "Content-Type: application/json" 
-d' { "message": "User1 deposited 1000 AP2 points" }' 

curl -X POST "http://localhost:9200/transaction-logs/_doc/3" 
-H "Content-Type: application/json" 
-d' { "message": "User1 deposited 1000 AP3 points" }' 

curl -X POST "http://localhost:9200/transaction-logs/_doc/4" 
-H "Content-Type: application/json" 
-d' { "message": "User1 deposited 1000 PP1 points" }'

Now, we’ve created an index with documents, and we can start exploring the different approaches to filter them.

3. Using Regexp With must_not

Regular expressions give us flexible pattern matching for complex exclusion cases. Let’s query our transaction-logs index and include only the log messages that do not contain any values between AP2 and AP9.:

curl -X GET "http://localhost:9200/transaction-logs/_search" -H "Content-Type: application/json" -d'
{
  "query": {
    "bool": {
      "must_not": [
         { "regexp": { "message.keyword": ".*AP[2-9].*" } }
      ]
    }
  }
}'

We’ve used the regexp keyword to find all such cases and the must_not to revert this instruction. In the response, we’ll see:

{
  "hits": [
    {
      "_index": "transaction-logs",
      "_id": "1",
      "_score": 0.0,
      "_source": {
        "message": "User1 deposited 1000 AP1 points"
      }
    },
    {
      "_index": "transaction-logs",
      "_id": "4",
      "_score": 0.0,
      "_source": {
        "message": "User1 deposited 1000 PP1 points"
      }
    }
  ]
}

We should consider that regular expressions are a low-performance operation, so they’re only suitable when we have no other choice.

4. Using Wildcard With must_not

We can use the wildcard approach as a more efficient method for substring exclusion. We have limitations here and cannot use the full regular expression syntax. However, we can still exclude the substrings from our results. Let’s query our index and try to exclude all transactions with the AP symbol:

curl -X GET "http://localhost:9200/transaction-logs/_search" -H "Content-Type: application/json" -d'
{
  "query": {
    "bool": {
      "must_not": [
        { "wildcard": { "message.keyword": "*AP*" } }
      ]
    }
  }
}'

Here, we again use the must_not to revert the wildcard instruction. As a result, we get:

{
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.0,
    "hits": [
      {
        "_index": "transaction-logs",
        "_id": "4",
        "_score": 0.0,
        "_source": {
          "message": "User1 deposited 1000 PP1 points"
        }
      }
    ]
  }
}

As expected, all the AP transactions were filtered out.

5. Using Query String With must_not

We can also use a query string syntax with a wildcard. Under the hood, we’ll achieve the same wildcard query but with a smaller request. Let’s run the query to filter out the same AP transactions:

curl -X GET "http://localhost:9200/transaction-logs/_search" -H "Content-Type: application/json" -d'
{
  "query": {
    "bool": {
      "must_not": [
       { "query_string": { "query": "message:*AP*"} }
      ]
    }
  }
}'

Here, we’ve used the query_string syntax with the must_not operator. As a result, we’ll see the same expected PP transaction logs:

{
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.0,
    "hits": [
      {
        "_index": "transaction-logs",
        "_id": "4",
        "_score": 0.0,
        "_source": {
          "message": "User1 deposited 1000 PP1 points"
        }
      }
    ]
  }
}

Wildcards are faster than regex, but they’re still a relatively low-performance operation and may run slowly.

6. Using Match With must_not and Customized Analyzer

If we know our query parameters in advance, we can achieve the not-contains behavior most efficiently. During the creation of the index, we may specify the customized analyzer. Inside its properties, we can add a word delimiter or even define a custom tokenizer.

Let’s recreate our transaction-logs index with a delimiter specified:

curl -X PUT "localhost:9200/transaction-logs" 
-H "Content-Type: application/json" 
-d' 
{
  "settings": {
    "analysis": {
      "analyzer": {
        "message_analyzer": {
          "tokenizer": "whitespace",
          "filter": ["lowercase", "word_delimiter"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "message": {
        "type": "text",
        "analyzer": "message_analyzer"
      }
    }
  }
}'

Having this configuration, from the AP1 word, we’ll obtain the ap, ap1, and 1 tokens. Now, we can just query our index using must_not with match instruction:

curl -X GET "http://localhost:9200/transaction-logs/_search" 
-H "Content-Type: application/json" 
-d' {
  "query": {
    "bool": {
      "must_not": [
         { "match": { "message": "AP" } }
      ]
    }
  }
}'

In the response, we’ll see the same AP transactions filtered out. This query will be much more efficient than regexp or wildcards. However, we should consider the compromises in query options, which become more complex and heavy.

7. Conclusion

In this article, we’ve reviewed different approaches to achieve the not contains behavior in Elasticsearch. All of them rely on the must_not operator, which reverses the matching criteria. Each approach is a compromise between the required capabilities and performance.

We can use regexp to build the most flexible queries when performance is not a concern. On the other hand, we can make the tokenization process more complex and rely only on predicted substrings, but in return, we’ll get much faster queries.

The code backing this article is available on GitHub. Once you're logged in as a Baeldung Pro Member, start learning and coding on the project.