When a data node finishes recovering a shard it notifies the master to move it to state STARTED. Today we repeat this request every time we receive a cluster state that hasn't updated the shard state yet:
|
if (shardRouting.initializing() && (state == IndexShardState.STARTED || state == IndexShardState.POST_RECOVERY)) { |
|
// the master thinks we are initializing, but we are already started or on POST_RECOVERY and waiting |
|
// for master to confirm a shard started message (either master failover, or a cluster event before |
|
// we managed to tell the master we started), mark us as started |
|
if (logger.isTraceEnabled()) { |
|
logger.trace("{} master marked shard as initializing, but shard has state [{}], resending shard started to {}", |
|
shardRouting.shardId(), state, nodes.getMasterNode()); |
|
} |
|
if (nodes.getMasterNode() != null) { |
|
shardStateAction.shardStarted( |
|
shardRouting, |
|
primaryTerm, |
|
"master " + nodes.getMasterNode() + " marked shard as initializing, but shard state is [" + state + |
|
"], mark shard as started", |
|
shard.getTimestampRange(), |
|
SHARD_STATE_ACTION_LISTENER, |
|
clusterState); |
|
} |
|
} |
This behaviour means if the master is busy processing (potentially thousands) of other URGENT tasks then we'll submit the same task repeatedly (potentially thousands of times). It dates back a long time but is no longer necessary: we can trust that the master will process our original request first (or we get notified that it failed). We should stop sending these unnecessary retries.
Relates #77466
When a data node finishes recovering a shard it notifies the master to move it to state
STARTED. Today we repeat this request every time we receive a cluster state that hasn't updated the shard state yet:elasticsearch/server/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java
Lines 601 to 619 in b6fbf5a
This behaviour means if the master is busy processing (potentially thousands) of other
URGENTtasks then we'll submit the same task repeatedly (potentially thousands of times). It dates back a long time but is no longer necessary: we can trust that the master will process our original request first (or we get notified that it failed). We should stop sending these unnecessary retries.Relates #77466