A method to reduce the time cost to update cluster state

ES_VERSION: 5.6.8 
JVM version : JDK1.8.0_112
OS version : linux
Description of the problem including expected versus actual behavior:
&#160; &#160; &#160; &#160;As it's known, Updating cluster state on master node will cost too much time, which seriously affects the size and stability of the cluster. In out product, updating cluster state will cost 15s+ with the cluster of  50 nodes and 3,000 indices, 60,000 shard, the experience is very poor when we want to create index and delete index.
&#160; &#160; &#160; &#160;To find out why it cost so much time on updating cluste state, I get the thread stack about updateTask, such that:
```
"elasticsearch[node1][clusterService#updateTask][T#1]" #32 daemon prio=5 os_prio=0 tid=0x00007f5d703a2800 nid=0x8252 runnable [0x00007f5c22b71000]
   java.lang.Thread.State: RUNNABLE
        at java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1041)
        at org.elasticsearch.cluster.routing.RoutingNode.shardsWithState(RoutingNode.java:148)
        at org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDecider.sizeOfRelocatingShards(DiskThresholdDecider.java:90)
        at org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDecider.getDiskUsage(DiskThresholdDecider.java:320)
        at org.elasticsearch.cluster.routing.allocation.decider.DiskThresholdDecider.canRemain(DiskThresholdDecider.java:265)
        at org.elasticsearch.cluster.routing.allocation.decider.AllocationDeciders.canRemain(AllocationDeciders.java:105)
        at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator$Balancer.decideMove(BalancedShardsAllocator.java:687)
        at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator$Balancer.moveShards(BalancedShardsAllocator.java:648)
        at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator.allocate(BalancedShardsAllocator.java:123)
        at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:329)
        at org.elasticsearch.cluster.routing.allocation.AllocationService.applyStartedShards(AllocationService.java:100)
```
&#160; &#160; &#160; &#160;I try several times and get the same thread stack. it seems that `DiskThresholdDecider.sizeOfRelocatingShards` will cost too much time, the code is as follow:
```
 static long sizeOfRelocatingShards(RoutingNode node, RoutingAllocation allocation,
                                       boolean subtractShardsMovingAway, String dataPath) {
      ClusterInfo clusterInfo = allocation.clusterInfo();
      long totalSize = 0;
      for (ShardRouting routing : node.shardsWithState(ShardRoutingState.RELOCATING, 
     ShardRoutingState.INITIALIZING)) {
             ......
      }
      ......
}
```
&#160; &#160; &#160; &#160;It says that: to test whether the shard can remain stay on the node or not ,we will get the size of relocating shards, then we will get all the shards(about 6,000 shards on one node) of the node, check the shards if is be `RELOCATING` or `INITIALIZING`. This is only one shard, there have 60,000 shard need to be test, and  will be 60,000 * 6,000 times checkout, which will cost too much times.
&#160; &#160; &#160; &#160;I find that we can use the settings to avoid this check: `"cluster.routing.allocation.disk.include_relocations":"false"`. when i set it to be false, the time to update cluster state decreases from 15s to 3s which has achives better result.
&#160; &#160; &#160; &#160;if we could set the `cluster.routing.allocation.disk.include_relocations` to be `false` by default, most of us will ignore the default setting. or  if we could reserve the shard state of relocating and initializing about every node in cluster state, so we will not find out the shards every time by checking every time when updating cluster state.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A method to reduce the time cost to update cluster state #46941

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

A method to reduce the time cost to update cluster state #46941

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions