Baeldung Pro – Ops – NPI EA (cat = Baeldung on Ops)
announcement - icon

Learn through the super-clean Baeldung Pro experience:

>> Membership and Baeldung Pro.

No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.

Partner – Orkes – NPI EA (cat=Kubernetes)
announcement - icon

Modern software architecture is often broken. Slow delivery leads to missed opportunities, innovation is stalled due to architectural complexities, and engineering resources are exceedingly expensive.

Orkes is the leading workflow orchestration platform built to enable teams to transform the way they develop, connect, and deploy applications, microservices, AI agents, and more.

With Orkes Conductor managed through Orkes Cloud, developers can focus on building mission critical applications without worrying about infrastructure maintenance to meet goals and, simply put, taking new products live faster and reducing total cost of ownership.

Try a 14-Day Free Trial of Orkes Conductor today.

1. Overview

Elasticsearch relies on shards to distribute data across nodes in a cluster. Moreover, shards are essential when measuring scalability and can enhance the performance of Elasticsearch. Yet, what happens when shards become unassigned? How does this impact the cluster and in response, what can we do about it?

In this tutorial, we’ll explore what unassigned shards are, understand their impact, and learn how to correctly diagnose and resolve problems stemming from such shards.

Notably, we assume commands run on the Elasticsearch deployment machine with the default ports.

2. What Are Unassigned Shards?

Simply put, an unassigned shard is a shard that isn’t allocated to any node in the cluster. Thus, this means that when a shard is unassigned, we can’t access the data stored in that shard whether for search or indexing operations.

2.1. Why Do Shards Become Unassigned?

Several reasons can lead to shards becoming unassigned in an Elasticsearch cluster. Firstly, node failure is one of the most common reasons. When a node in the cluster goes down or becomes unreachable, the shards hosted on that node become unassigned until Elasticsearch can reallocate them to other nodes.

Additionally, Elasticsearch is always keeping an eye on the distribution of shards across nodes. So, an imbalance may trigger a cluster rebalancing process during which some shards might temporarily become unassigned as they’re being moved around to achieve a more even distribution.

Furthermore, if a node in the cluster starts running out of storage space, Elasticsearch may refuse to allocate any more shards to that node. Thus, this results in already assigned shards becoming unassigned until we free up some space.

2.2. What Impacts Do Unassigned Shards Have on Cluster Performance?

When shards are unassigned, the data they hold becomes inaccessible. This means searches may return incomplete results and this can also hinder indexing operations. Additionally, the cluster may experience increased load as it attempts to reallocate the unassigned shards, potentially affecting performance.

3. Diagnosing Unassigned Shards

Notably, Elasticsearch provides APIs that can help in diagnosing unassigned shards.

3.1. _cat/shards API

The _cat/shards API gives a quick overview of all the shards and includes their state and allocation status in the cluster:

$ curl -X GET "localhost:9200/_cat/shards?v"
index                                                         shard prirep state      docs  store dataset ip        node
customers_v2                                                  0     p      STARTED      10  7.9kb   7.9kb 127.0.0.1 node-1
products_new                                                  0     p      STARTED       2  5.5kb   5.5kb 127.0.0.1 node-1
test-index                                                    0     p      UNASSIGNED                               
test-index                                                    1     p      UNASSIGNED                               
test-index                                                    2     p      UNASSIGNED                               
products_old                                                  0     p      STARTED       2  5.2kb   5.2kb 127.0.0.1 node-1

Looking at the output from the command, shards with the state UNASSIGNED are the unassigned shards.

3.2. _cluster/health API

The _cluster/health API offers insights into the overall health of the Elasticsearch cluster which include the status of shard allocation:

$ curl -X GET "localhost:9200/_cluster/health?pretty"
{
  "cluster_name" : "my-cluster",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 8,
  "active_shards" : 8,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 3,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 72.72727272727273
}

The presence of unassigned_shards greater than zero in the output indicates unassigned shards. Additionally, a cluster status of yellow or red also indicates unassigned shards.

4. Resolving Unassigned Shards

Now that we’ve diagnosed the presence of unassigned shards and identified potential root causes, let’s explore some steps we can take to get the cluster back to a healthy state.

4.1. Restarting Failed Nodes

If unassigned shards are caused by node failure, the first step is to restart the failed nodes. Once the nodes are back online, Elasticsearch automatically attempts to reallocate the unassigned shards to the available nodes. We still have to check the cluster health and shard allocation status to ensure that Elasticsearch has successfully reassigned the shards.

4.2. Allocating Shards Manually

In some cases, we may need to manually allocate unassigned shards to specific nodes using the _cluster/reroute API with the allocate_empty_primary parameter if the shard we want to allocate is a primary shard. Otherwise, we use allocate_replica for replica shards.

Since the shard we want to allocate is a primary shard, let’s use the allocate_empty_primary:

$ curl -XPOST 'localhost:9200/_cluster/reroute' -H 'Content-Type: application/json' -d '{
    "commands": [{
        "allocate_empty_primary": {
            "index": "test-index",
            "shard": 1,
            "node": "node-1",
            "accept_data_loss": true
        }
    }]
}'
{"acknowledged":true,"state":{"cluster_uuid":"0W8o4rxdSniXsf_grVvxvQ","version":299,"state_uuid":"rHRyobYiSZ2VIOlxep0jxw","master_node":"nIzM4TPDQuS0WDHkSjEN1w","blocks":{}...

Notably, we include the “accept_data_loss”: true parameter to acknowledge that allocating an empty primary shard may actually result in data loss.

We can replicate the above command for any shard allocation task by replacing the relevant fields:

  • test-index with the actual index name
  • 1 with the desired shard number
  • ode-1 with the name of the target node where we want to allocate the shard

If the problem we’re facing is a storage space issue on a certain node, we can configure shard allocation to nodes with sufficient free space. To achieve this, we use the index.routing.allocation settings when creating or updating an index:

$ curl -X PUT "http://localhost:9200/test_index/_settings" \
-H 'Content-Type: application/json' \
-d '{ "index.routing.allocation.require._tag": "data_node" }'
{"acknowledged":true}

As a result, this setting ensures that Elasticsearch allocates shards for the test-index index to nodes with the data_node tag.

4.3. Adjusting Cluster Settings

Elasticsearch has some cluster-level settings that can help resolve unassigned shards and improve the cluster’s health.

The cluster.routing.allocation.enable setting is responsible for controlling the shard allocation behavior in the cluster. By default, we adjust this setting to all. Thus, we expect that Elasticsearch will try to recover and reassign unassigned shards automatically. The presence of an unassigned shard might have been caused by changing the setting to prevent this behavior.

Thus, we can reset the value to its default:

$ curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.enable": null
  }
}'
{"acknowledged":true}

In this case, the cluster.routing.allocation.enable set to null means that we’re clearing any previous values for this setting. As a result, Elasticsearch reverts to the default value all.

Furthermore, the cluster.routing.allocation.node_concurrent recoveries setting determines how many concurrent shard recoveries each node can handle. Bumping up this value can speed up the recovery process when dealing with a large number of unassigned shards:

$ curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "cluster.routing.allocation.node_concurrent_recoveries": 5
  }
}'
{"acknowledged":true}

By sending this request, we’re updating the cluster’s setting to allow 5 shard recoveries to happen at the same time on each node. In return, this may optimize recovery operations in the Elasticsearch cluster especially when dealing with high recovery scenarios.

4.4. Setting Number of Replicas

In case of a node failure, we can ensure that each shard has a sufficient number of replicas across different nodes thereby reducing the risk of unassigned shards:

$ curl -X PUT "localhost:9200/test-index" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "index.number_of_replicas": 2
  }
}'
{"acknowledged":true}

Setting the number of replicas to 2 means that each primary shard has 2 replica shards. However, there’s a caveat: if the cluster doesn’t have enough nodes to allocate all replicas, Elasticsearch marks the replicas as unassigned. In this case, we need at least three nodes to totally allocate the primary shard and its 2 replicas across different nodes.

5. Conclusion

In this article, we explored the different reasons shards can become unassigned. Further, we learned how to use Elasticsearch APIs to track down unassigned shards. Finally, we looked into the various strategies for getting those shards assigned and the cluster back in action.

In conclusion, cluster management is important in resolving or preventing unassigned shards.