Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: September 17, 2024
Elasticsearch relies on shards to distribute data across nodes in a cluster. Moreover, shards are essential when measuring scalability and can enhance the performance of Elasticsearch. Yet, what happens when shards become unassigned? How does this impact the cluster and in response, what can we do about it?
In this tutorial, we’ll explore what unassigned shards are, understand their impact, and learn how to correctly diagnose and resolve problems stemming from such shards.
Notably, we assume commands run on the Elasticsearch deployment machine with the default ports.
Simply put, an unassigned shard is a shard that isn’t allocated to any node in the cluster. Thus, this means that when a shard is unassigned, we can’t access the data stored in that shard whether for search or indexing operations.
Several reasons can lead to shards becoming unassigned in an Elasticsearch cluster. Firstly, node failure is one of the most common reasons. When a node in the cluster goes down or becomes unreachable, the shards hosted on that node become unassigned until Elasticsearch can reallocate them to other nodes.
Additionally, Elasticsearch is always keeping an eye on the distribution of shards across nodes. So, an imbalance may trigger a cluster rebalancing process during which some shards might temporarily become unassigned as they’re being moved around to achieve a more even distribution.
Furthermore, if a node in the cluster starts running out of storage space, Elasticsearch may refuse to allocate any more shards to that node. Thus, this results in already assigned shards becoming unassigned until we free up some space.
When shards are unassigned, the data they hold becomes inaccessible. This means searches may return incomplete results and this can also hinder indexing operations. Additionally, the cluster may experience increased load as it attempts to reallocate the unassigned shards, potentially affecting performance.
Notably, Elasticsearch provides APIs that can help in diagnosing unassigned shards.
The _cat/shards API gives a quick overview of all the shards and includes their state and allocation status in the cluster:
$ curl -X GET "localhost:9200/_cat/shards?v"
index shard prirep state docs store dataset ip node
customers_v2 0 p STARTED 10 7.9kb 7.9kb 127.0.0.1 node-1
products_new 0 p STARTED 2 5.5kb 5.5kb 127.0.0.1 node-1
test-index 0 p UNASSIGNED
test-index 1 p UNASSIGNED
test-index 2 p UNASSIGNED
products_old 0 p STARTED 2 5.2kb 5.2kb 127.0.0.1 node-1
Looking at the output from the command, shards with the state UNASSIGNED are the unassigned shards.
The _cluster/health API offers insights into the overall health of the Elasticsearch cluster which include the status of shard allocation:
$ curl -X GET "localhost:9200/_cluster/health?pretty"
{
"cluster_name" : "my-cluster",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 8,
"active_shards" : 8,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 3,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 72.72727272727273
}
The presence of unassigned_shards greater than zero in the output indicates unassigned shards. Additionally, a cluster status of yellow or red also indicates unassigned shards.
Now that we’ve diagnosed the presence of unassigned shards and identified potential root causes, let’s explore some steps we can take to get the cluster back to a healthy state.
If unassigned shards are caused by node failure, the first step is to restart the failed nodes. Once the nodes are back online, Elasticsearch automatically attempts to reallocate the unassigned shards to the available nodes. We still have to check the cluster health and shard allocation status to ensure that Elasticsearch has successfully reassigned the shards.
In some cases, we may need to manually allocate unassigned shards to specific nodes using the _cluster/reroute API with the allocate_empty_primary parameter if the shard we want to allocate is a primary shard. Otherwise, we use allocate_replica for replica shards.
Since the shard we want to allocate is a primary shard, let’s use the allocate_empty_primary:
$ curl -XPOST 'localhost:9200/_cluster/reroute' -H 'Content-Type: application/json' -d '{
"commands": [{
"allocate_empty_primary": {
"index": "test-index",
"shard": 1,
"node": "node-1",
"accept_data_loss": true
}
}]
}'
{"acknowledged":true,"state":{"cluster_uuid":"0W8o4rxdSniXsf_grVvxvQ","version":299,"state_uuid":"rHRyobYiSZ2VIOlxep0jxw","master_node":"nIzM4TPDQuS0WDHkSjEN1w","blocks":{}...
Notably, we include the “accept_data_loss”: true parameter to acknowledge that allocating an empty primary shard may actually result in data loss.
We can replicate the above command for any shard allocation task by replacing the relevant fields:
If the problem we’re facing is a storage space issue on a certain node, we can configure shard allocation to nodes with sufficient free space. To achieve this, we use the index.routing.allocation settings when creating or updating an index:
$ curl -X PUT "http://localhost:9200/test_index/_settings" \
-H 'Content-Type: application/json' \
-d '{ "index.routing.allocation.require._tag": "data_node" }'
{"acknowledged":true}
As a result, this setting ensures that Elasticsearch allocates shards for the test-index index to nodes with the data_node tag.
Elasticsearch has some cluster-level settings that can help resolve unassigned shards and improve the cluster’s health.
The cluster.routing.allocation.enable setting is responsible for controlling the shard allocation behavior in the cluster. By default, we adjust this setting to all. Thus, we expect that Elasticsearch will try to recover and reassign unassigned shards automatically. The presence of an unassigned shard might have been caused by changing the setting to prevent this behavior.
Thus, we can reset the value to its default:
$ curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"persistent": {
"cluster.routing.allocation.enable": null
}
}'
{"acknowledged":true}
In this case, the cluster.routing.allocation.enable set to null means that we’re clearing any previous values for this setting. As a result, Elasticsearch reverts to the default value all.
Furthermore, the cluster.routing.allocation.node_concurrent recoveries setting determines how many concurrent shard recoveries each node can handle. Bumping up this value can speed up the recovery process when dealing with a large number of unassigned shards:
$ curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"persistent": {
"cluster.routing.allocation.node_concurrent_recoveries": 5
}
}'
{"acknowledged":true}
By sending this request, we’re updating the cluster’s setting to allow 5 shard recoveries to happen at the same time on each node. In return, this may optimize recovery operations in the Elasticsearch cluster especially when dealing with high recovery scenarios.
In case of a node failure, we can ensure that each shard has a sufficient number of replicas across different nodes thereby reducing the risk of unassigned shards:
$ curl -X PUT "localhost:9200/test-index" -H 'Content-Type: application/json' -d'
{
"settings": {
"index.number_of_replicas": 2
}
}'
{"acknowledged":true}
Setting the number of replicas to 2 means that each primary shard has 2 replica shards. However, there’s a caveat: if the cluster doesn’t have enough nodes to allocate all replicas, Elasticsearch marks the replicas as unassigned. In this case, we need at least three nodes to totally allocate the primary shard and its 2 replicas across different nodes.
In this article, we explored the different reasons shards can become unassigned. Further, we learned how to use Elasticsearch APIs to track down unassigned shards. Finally, we looked into the various strategies for getting those shards assigned and the cluster back in action.
In conclusion, cluster management is important in resolving or preventing unassigned shards.