Use general cluster state batching mechanism for shard failures#15016
Use general cluster state batching mechanism for shard failures#15016jasontedor merged 4 commits intoelastic:masterfrom jasontedor:shard-failure-batch
Conversation
|
@bleskes I'll rebase this pull request on master when #14899 is reintegrated there. The salient commit for this review is thus ccc89c3666780d0bf7b2425f9e8c76dbe6316187 pending #14899 (all the changes for that commit are in ShardStateAction.java and some minor modifications in AllocationService.java). |
This commit modifies the handling of shard failure cluster state updates to use the general cluster state batching mechanism. An advantage of this approach is we now get correct per-listener notification on failures.
There was a problem hiding this comment.
why did we loose the batch application of shard failures?
There was a problem hiding this comment.
My thinking was to get task failures only for the shards for which we were unsuccessful in marking as failed.
There was a problem hiding this comment.
we share some logic when we do it in a batch, most notably the reroute. I wonder if we should improve the reporting from the applyFailedShards so we know what happened (we'll need it)
There was a problem hiding this comment.
I wonder if we should improve the reporting from the applyFailedShards so we know what happened (we'll need it)
That's what I'm thinking now, but it should be in a separate issue on the #14252 work I think?
There was a problem hiding this comment.
I pushed 413688b to apply the failures in a single batch. For now, we will not get task-specific failures but that will come in a follow up.
|
looking good. Left some comments. |
There was a problem hiding this comment.
we don't need this processed with this change..
|
LGTM. Left trivial comments. |
Use general cluster state batching mechanism for shard failures
|
One note - the comment from the shard started pr about error reporting holds for this one as well. Is the plan to do a followup? As with the shard started, we can just let the exception bubble up |
@bleskes Yeah. :) |
I opened #15428. |
This commit modifies the handling of shard failure cluster state updates
to use the general cluster state batching mechanism. An advantage of
this approach is we now get correct per-listener notification on
failures.
Relates #14899, relates #14725