-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Shard relocations are retried indefinitely #79445
Copy link
Copy link
Closed
Labels
:Distributed/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)All issues relating to the decision making around placing a shard (both master logic & on the nodes)>bugTeam:DistributedMeta label for distributed team.Meta label for distributed team.
Metadata
Metadata
Assignees
Labels
:Distributed/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)All issues relating to the decision making around placing a shard (both master logic & on the nodes)>bugTeam:DistributedMeta label for distributed team.Meta label for distributed team.
Type
Fields
Give feedbackNo fields configured for issues without a type.
We keep count of how many times we've tried to allocate a shard, and if we fail too many times in a row then we give up (via the
MaxRetryAllocationDecider) rather than repeatedly failing forever. Today the failure count is attached to theShardRoutingwhich makes sense for unassigned shards, but when a shard is already assigned and being relocated then the relocation target is represented by a temporaryShardRoutingwhich only exists while the relocation is ongoing. If the relocation fails then the targetShardRoutingdisappears so we lose track of the failure, which means there's nothing stopping us from attempting the same relocation forever.