Shard failure requests for non-existent shards by jasontedor · Pull Request #16089 · elastic/elasticsearch

jasontedor · 2016-01-19T13:37:30Z

This commit adds handling on the master side for shard failure requests
for shards that do not exist at the time that they are processed on the
master node (whether it be from errant requests, duplicate requests, or
both the primary and replica notifying the master of a shard
failure). This change is made because such shard failure requests should
always be considered successful (the failed shard is not there anymore),
but could be marked as failed if batched with a shard failure request
that does in fact fail.

Relates #14252

This commit adds handling on the master side for shard failure requests for shards that do not exist at the time that they are processed on the master node (whether it be from errant requests, duplicate requests, or both the primary and replica notifying the master of a shard failure). This change is made because such shard failure requests should always be considered successful (the failed shard is not there anymore), but could be marked as failed if batched with a shard failure request that does in fact fail.

jasontedor · 2016-01-20T16:31:20Z

@bleskes I've updated this pull request.

bleskes · 2016-01-21T13:57:36Z

core/src/main/java/org/elasticsearch/cluster/action/shard/ShardStateAction.java

can we call these - missingShardTasks or alreadyRemovedTasks?

sorry - got confused - maybe resolvedShards? (nonTrivial is so vague)

@bleskes I pushed 418a63f.

bleskes · 2016-01-21T14:01:35Z

Thanks @jasontedor . Left some comments/questions...

This commit renames a variable in ShardFailedClusterStateTaskExecutor#execute in that it contains the tasks that need the allocation service to be processed.

This commit simplifies the task-splitting logic when processing a batch of shard failure tasks.

jasontedor · 2016-01-23T20:00:16Z

core/src/main/java/org/elasticsearch/cluster/ClusterStateTaskExecutor.java

This was a bug, but fortunately the method was unused until now so not impactful.

jasontedor · 2016-01-23T20:00:47Z

@bleskes I pushed two commits.

This commit adds tests of the execution logic when failing a batch of shards. Three tests are added: - test that an empty task list produces no change in the cluster state - test that duplicate shard failure requests are processed successfully - test that shard failure requests for non-existent shards are processed successfully - test that failing tasks do not prevent trivially successful tasks from succeeding

bleskes · 2016-01-26T13:51:04Z

core/src/main/java/org/elasticsearch/cluster/action/shard/ShardStateAction.java

can we move this up next to the stream partitioning so it will be clearer why we do this?

I pushed 8d8bd8b.

This commit rearranges code to clarify the intent of the task partitioning in the shard failure cluster state task executor.

bleskes · 2016-01-26T16:08:24Z

...st/java/org/elasticsearch/cluster/action/shard/ShardFailedClusterStateTaskExecutorTests.java

can we also add shards with the same index and id as the existing ones but with a different allocation ids?

can we also add shards with the same index and id as the existing ones but with a different allocation ids?

That's a good suggestion.

@bleskes I pushed aa6a5c9.

bleskes · 2016-01-26T16:15:19Z

LGTM. Left some suggestions for the tests, non of it is blocking pushing this.

This commit adds some shards with the same shard ID as existing shards, but with a non-existent allocation ID. This tests that these shards are correctly handled by the shard failure cluster state task executor.

This commit adds assertions that when a shard failure request is successful, the shard that was requested to be failed is in fact no longer in the cluster state.

jasontedor · 2016-01-26T21:41:38Z

Thanks for another great review @bleskes.

jasontedor added >enhancement review v5.0.0-alpha1 labels Jan 19, 2016

jasontedor assigned bleskes Jan 19, 2016

jasontedor mentioned this pull request Jan 20, 2016

Wait on shard failures #14252

Closed

9 tasks

bleskes reviewed Jan 21, 2016
View reviewed changes

jasontedor added 2 commits January 21, 2016 09:47

Clarify tasks needing allocation service

418a63f

This commit renames a variable in ShardFailedClusterStateTaskExecutor#execute in that it contains the tasks that need the allocation service to be processed.

Simplify shard failure task-splitting logic

fc014e6

This commit simplifies the task-splitting logic when processing a batch of shard failure tasks.

jasontedor reviewed Jan 23, 2016
View reviewed changes

bleskes reviewed Jan 26, 2016
View reviewed changes

Clarify intent in shard failure executor

8d8bd8b

This commit rearranges code to clarify the intent of the task partitioning in the shard failure cluster state task executor.

bleskes reviewed Jan 26, 2016
View reviewed changes

jasontedor mentioned this pull request Jan 26, 2016

Add convenience method to RoutingNodes for shard by allocation ID #16240

Closed

jasontedor added 2 commits January 26, 2016 16:18

Also test non-existent allocation IDs

aa6a5c9

This commit adds some shards with the same shard ID as existing shards, but with a non-existent allocation ID. This tests that these shards are correctly handled by the shard failure cluster state task executor.

Successful failed shard requests remove shards

2979d68

This commit adds assertions that when a shard failure request is successful, the shard that was requested to be failed is in fact no longer in the cluster state.

jasontedor closed this in 392814e Jan 26, 2016

jasontedor deleted the validate-shard-failure-requests branch January 26, 2016 21:38

jasontedor removed the review label Jan 26, 2016

clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. and removed :Cluster labels Feb 13, 2018

Conversation

jasontedor commented Jan 19, 2016

Uh oh!

jasontedor commented Jan 20, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes commented Jan 21, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasontedor commented Jan 23, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes commented Jan 26, 2016

Uh oh!

jasontedor commented Jan 26, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants