Today when each node receives a cluster state update it compares the states of its shards in the new routing table to their expected states, and triggers a shard-started or shard-failed transition if they don't match. We then capture the transition and suppress it if a duplicate request is already in flight (#31313 for shard-failed transitions, #82089 for shard-started ones).
This is pretty ugly. These transitions may be a long way down the master's queue so we may trigger (and then suppress) many duplicate requests. I think the reasons for this mechanism date back to a time when cluster state updates could occasionally be lost, but these problems are fixed today so we should move to a system that triggers the state update request only at the shard state transition and then relies on the fact that this request will eventually complete (possibly unsuccessfully, requiring a retry).
Today when each node receives a cluster state update it compares the states of its shards in the new routing table to their expected states, and triggers a
shard-startedorshard-failedtransition if they don't match. We then capture the transition and suppress it if a duplicate request is already in flight (#31313 forshard-failedtransitions, #82089 forshard-startedones).This is pretty ugly. These transitions may be a long way down the master's queue so we may trigger (and then suppress) many duplicate requests. I think the reasons for this mechanism date back to a time when cluster state updates could occasionally be lost, but these problems are fixed today so we should move to a system that triggers the state update request only at the shard state transition and then relies on the fact that this request will eventually complete (possibly unsuccessfully, requiring a retry).