Skip to content

Shard relocations are retried indefinitely #79445

@DaveCTurner

Description

@DaveCTurner

We keep count of how many times we've tried to allocate a shard, and if we fail too many times in a row then we give up (via the MaxRetryAllocationDecider) rather than repeatedly failing forever. Today the failure count is attached to the ShardRouting which makes sense for unassigned shards, but when a shard is already assigned and being relocated then the relocation target is represented by a temporary ShardRouting which only exists while the relocation is ongoing. If the relocation fails then the target ShardRouting disappears so we lose track of the failure, which means there's nothing stopping us from attempting the same relocation forever.

Metadata

Metadata

Assignees

Labels

:Distributed/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)>bugTeam:DistributedMeta label for distributed team.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions