Shard state transitions should be edge-triggered rather than level-triggered

Today when each node receives a cluster state update it compares the states of its shards in the new routing table to their expected states, and triggers a `shard-started` or `shard-failed` transition if they don't match. We then capture the transition and suppress it if a duplicate request is already in flight (#31313 for `shard-failed` transitions, #82089 for `shard-started` ones).

This is pretty ugly. These transitions may be a long way down the master's queue so we may trigger (and then suppress) many duplicate requests. I think the reasons for this mechanism date back to a time when cluster state updates could occasionally be lost, but these problems are fixed today so we should move to a system that triggers the state update request only at the shard state transition and then relies on the fact that this request will eventually complete (possibly unsuccessfully, requiring a retry).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shard state transitions should be edge-triggered rather than level-triggered #82185

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Shard state transitions should be edge-triggered rather than level-triggered #82185

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions