Skip to content

RestoreInProgressAllocationDecider doesn't really explain how to investigate why the shard was not restored #100233

@DaveCTurner

Description

@DaveCTurner

The explanation for a shard whose allocation is blocked by the RestoreInProgressAllocationDecider describes the situation accurately, but not in a way that helps users understand the problem:

            "shard has failed to be restored from the snapshot [%s] - manually close or delete the index [%s] in order to retry "
                + "to restore the snapshot again or use the reroute API to force the allocation of an empty primary shard. Details: [%s]",
            source.snapshot(),
            shardRouting.getIndexName(),
            shardRouting.unassignedInfo().getDetails()

If we simply could not assign the shard because of some other deciders then shardRouting.unassignedInfo().getNumFailedAllocations() will be zero and shardRouting.unassignedInfo().getDetails() may not contain any useful information. We should explain that allocating the shard was prevented on all nodes by other allocation deciders rather than simply saying the shard has "failed to be restored from the snapshot".

If we assigned the shard and the recovery failed then shardRouting.unassignedInfo().getNumFailedAllocations() will be positive and I expect there'll be an exception in this message, and also in the logs, so this case is a little clearer. I think we could still say that the recovery started and then failed, and maybe guide users towards the extra detail in the logs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Distributed/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)>bugSupportabilityImprove our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better.Team:DistributedMeta label for distributed team.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions