Skip to content

Shard state transitions should be edge-triggered rather than level-triggered #82185

@DaveCTurner

Description

@DaveCTurner

Today when each node receives a cluster state update it compares the states of its shards in the new routing table to their expected states, and triggers a shard-started or shard-failed transition if they don't match. We then capture the transition and suppress it if a duplicate request is already in flight (#31313 for shard-failed transitions, #82089 for shard-started ones).

This is pretty ugly. These transitions may be a long way down the master's queue so we may trigger (and then suppress) many duplicate requests. I think the reasons for this mechanism date back to a time when cluster state updates could occasionally be lost, but these problems are fixed today so we should move to a system that triggers the state update request only at the shard state transition and then relies on the fact that this request will eventually complete (possibly unsuccessfully, requiring a retry).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions