Do not cancel ongoing recovery for noop copy on broken node by dnhatn · Pull Request #48265 · elastic/elasticsearch

dnhatn · 2019-10-19T03:34:54Z

Today the replica allocator can repeatedly cancel an ongoing recovery for a copy on a broken node if that copy can perform a noop recovery. This loop can be happen endlessly (see testDoNotInfinitelyWaitForMapping). We should detect and avoid canceling in this situation.

Closes #47974

elasticmachine · 2019-10-19T03:34:55Z

Pinging @elastic/es-distributed (:Distributed/Allocation)

server/src/test/java/org/elasticsearch/gateway/ReplicaShardAllocatorIT.java

DaveCTurner

My concern with using a simple boolean is that it will suppress no-op recoveries to other nodes that might succeed. I think we should track which node failed the no-op allocation.

What would happen if instead we kept track of all the nodes on which this shard failed during initialization (whether no-op or otherwise) and ignored them all in the ReplicaShardAllocator? I am thinking particularly of #18417 - with a list of past failures we could also prefer to avoid those nodes in the BalancedShardsAllocator.

dnhatn · 2019-10-21T02:40:09Z

@DaveCTurner Thanks for looking. We would make better decisions with a list of failed nodes. We need to make sure that the list is bounded. I used a boolean to avoid adding more load to the cluster state.

DaveCTurner · 2019-10-21T07:39:29Z

I think a ~~list~~ set of nodes (or node IDs) would not be too much to add to the cluster state and would naturally be bounded: we would only add to it on an ALLOCATION_FAILED which only happens a few times thanks to the MaxRetryAllocationDecider; the list should of course be cleared in resetFailedAllocationCounter() (as a member of UnassignedInfo it also naturally goes away when a shard moves to STARTED)

dnhatn · 2019-10-21T13:01:45Z

@DaveCTurner Good point. I will apply your feedback. Thank you.

server/src/main/java/org/elasticsearch/gateway/ReplicaShardAllocator.java

dnhatn · 2019-10-21T17:59:45Z

@DaveCTurner It's ready again. Can you please take another look? Thank you.

DaveCTurner

Sorry for the delayed review @dnhatn, I thought I submitted this a few days ago but apparently not.

server/src/main/java/org/elasticsearch/cluster/routing/UnassignedInfo.java

server/src/main/java/org/elasticsearch/gateway/ReplicaShardAllocator.java

henningandersen

Thanks for working on this @dnhatn, I left a few comments to consider.

server/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationService.java

server/src/main/java/org/elasticsearch/gateway/ReplicaShardAllocator.java

server/src/test/java/org/elasticsearch/gateway/ReplicaShardAllocatorIT.java

server/src/main/java/org/elasticsearch/cluster/routing/UnassignedInfo.java

DaveCTurner

I left a couple of responses to threads in an earlier review and duplicated them here.

server/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationService.java

dnhatn · 2019-10-30T17:55:08Z

@DaveCTurner I've addressed your feedback. Would you mind taking another look? Thank you.

AthenaEryma · 2019-10-30T22:41:10Z

@dnhatn The test failure in ci/2 here is a known issue (#48531), feel free to ignore it (it's unmuted to collect logs for troubleshooting purposes).

dnhatn · 2019-10-31T00:11:37Z

Thanks @gwbrown.

@elasticmachine run elasticsearch-ci/2

DaveCTurner

Thanks @dnhatn this LGTM

dnhatn · 2019-11-01T13:19:54Z

@DaveCTurner @henningandersen @original-brownbear Thanks for reviewing :).

Relates #48265

This change fixes a poisonous situation where an ongoing recovery was canceled because a better copy was found on a node that the cluster had previously tried allocating the shard to but failed. The solution is to keep track of the set of nodes that an allocation was failed on so that we can avoid canceling the current recovery for a copy on failed nodes. Closes #47974

Relates #48265

Do not cancel ongoing recovery for noop copy on failed node

4026c6f

dnhatn added >enhancement :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.0.0 v7.5.0 v7.6.0 labels Oct 19, 2019

dnhatn requested review from DaveCTurner and ywelsch October 19, 2019 03:34

original-brownbear reviewed Oct 19, 2019

View reviewed changes

server/src/test/java/org/elasticsearch/gateway/ReplicaShardAllocatorIT.java Show resolved Hide resolved

DaveCTurner reviewed Oct 20, 2019

View reviewed changes

make test fail more often

600c3de

keep track list of failed nodes

3976586

dnhatn commented Oct 21, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/gateway/ReplicaShardAllocator.java Outdated Show resolved Hide resolved

dnhatn requested a review from DaveCTurner October 21, 2019 17:59

Merge branch 'master' into ignore-failed-node

5cb6476

dnhatn requested a review from henningandersen October 23, 2019 13:07

DaveCTurner reviewed Oct 24, 2019

View reviewed changes

henningandersen reviewed Oct 28, 2019

View reviewed changes

dnhatn added 7 commits October 29, 2019 12:07

Merge branch 'master' into ignore-failed-node

5781915

track only nodes that failed to perform noop allocations

0dd5cc8

summary

05e7691

carry over the failed set when cancel recovery

89c6753

ignorePreviousFailedNodes -> noMatchFailedNoopAllocationNodes

a77b19d

maintain the failed node list when reset failed counter

e27bafe

wait for new node

51dd4e6

dnhatn requested review from DaveCTurner and henningandersen October 29, 2019 21:31

DaveCTurner reviewed Oct 30, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationService.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocationService.java Outdated Show resolved Hide resolved

dnhatn added 3 commits October 30, 2019 11:40

Merge branch 'master' into ignore-failed-node

a05a2d6

discard failed nodes when retry failed allocation

1ca4ed6

Track failed nodes except while started

42e6b54

dnhatn requested a review from DaveCTurner October 30, 2019 17:55

DaveCTurner approved these changes Oct 31, 2019

View reviewed changes

dnhatn merged commit 36ee74f into elastic:master Nov 1, 2019

dnhatn deleted the ignore-failed-node branch November 1, 2019 13:23

dnhatn added the backport pending label Nov 1, 2019

dnhatn added a commit that referenced this pull request Nov 1, 2019

Fix testCancelRecoveryIfFoundCopyWithNoopRetentionLease

20d4ad8

Relates #48265

dnhatn added a commit that referenced this pull request Nov 9, 2019

Adjust bwc for #48265

071e236

Relates #48265

dnhatn removed the backport pending label Nov 9, 2019

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

DaveCTurner mentioned this pull request Mar 5, 2020

Exponential backoff of failed allocation #24530

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Conversation

dnhatn commented Oct 19, 2019

Uh oh!

elasticmachine commented Oct 19, 2019

Uh oh!

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

dnhatn commented Oct 21, 2019

Uh oh!

DaveCTurner commented Oct 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dnhatn commented Oct 21, 2019

Uh oh!

Uh oh!

dnhatn commented Oct 21, 2019

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dnhatn commented Oct 30, 2019

Uh oh!

AthenaEryma commented Oct 30, 2019

Uh oh!

dnhatn commented Oct 31, 2019

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

dnhatn commented Nov 1, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

DaveCTurner commented Oct 21, 2019 •

edited

Loading