Replica allocation consider no-op#42518
Closed
henningandersen wants to merge 9 commits intoelastic:masterfrom
Closed
Replica allocation consider no-op#42518henningandersen wants to merge 9 commits intoelastic:masterfrom
henningandersen wants to merge 9 commits intoelastic:masterfrom
Conversation
This is a first step away from sync-ids. We now check if replica and primary are identical using sequence numbers when determining where to allocate a replica shard. If an index is no longer indexed into, issuing a regular flush will now be enough to ensure a no-op recovery is done. This has the nice side-effect of ensuring that closed indices and frozen indices choose existing shard copies with identical data over file-overlap comparison, increasing the chance that we end up doing a no-op recovery (only no-op and file-based recovery is supported by closed indices). Relates elastic#41400 and elastic#33888 Supersedes elastic#41784
Collaborator
|
Pinging @elastic/es-distributed |
Hopefully this makes test succeed in CI too.
Now lock during cleanup files to protect snapshotRecoveryMetadata from seeing half copied data. snapshotRecoveryMetadata now handles peer recovery and existing store recovery specifically, returning empty snapshot in other recovery types (local shards, restore snapshot).
dnhatn
reviewed
Jun 4, 2019
Member
dnhatn
left a comment
There was a problem hiding this comment.
This looks great. Thanks @henningandersen. Would you mind splitting this PR to multiple smaller pieces?
| /** | ||
| * We test that a closed index makes no-op replica allocation only. | ||
| */ | ||
| public void testClosedIndexReplicaAllocation() throws Exception { |
Member
There was a problem hiding this comment.
I think this test passed with the current behaviour. Can we make a small PR for this test only?
| * Whenever we see a new data node, we clear the information we have on primary to ensure it is at least as recent as the start | ||
| * of the new node. This reduces risk of making a decision on stale information from primary. | ||
| */ | ||
| private void ensureAsyncFetchStorePrimaryRecency(RoutingAllocation allocation) { |
Member
There was a problem hiding this comment.
Can you make a separate PR for this enhancement?
| return primaryStore.hasSeqNoInfo() | ||
| && primaryStore.maxSeqNo() == candidateStore.maxSeqNo() | ||
| && primaryStore.provideRecoverySeqNo() <= candidateStore.requireRecoverySeqNo() | ||
| && candidateStore.requireRecoverySeqNo() == primaryStore.maxSeqNo() + 1; |
Member
There was a problem hiding this comment.
Not sure if we need the last condition?
| * Finalize index recovery. Manipulate store files, clean up old files, generate new empty translog and do other | ||
| * housekeeping for retention leases. | ||
| */ | ||
| public void finalizeIndexRecovery(CheckedRunnable<IOException> manipulateStore, long globalCheckpoint, |
Member
There was a problem hiding this comment.
Can you also make a separate PR for this enhancement?
Contributor
Author
|
Thanks for reviewing @dnhatn , I have marked this WIP and will split it into multiple PRs (and then close this one). |
10 tasks
Member
This was referenced Sep 24, 2019
dnhatn
added a commit
that referenced
this pull request
Sep 28, 2019
Today, we don't clear the shard info of the primary shard when a new node joins; then we might risk of making replica allocation decisions based on the stale information of the primary. The serious problem is that we can cancel the current recovery which is more advanced than the copy on the new node due to the old info we have from the primary. With this change, we ensure the shard info from the primary is not older than any node when allocating replicas. Relates #46959 This work was done by Henning in #42518. Co-authored-by: Henning Andersen <henning.andersen@elastic.co>
dnhatn
added a commit
that referenced
this pull request
Oct 2, 2019
Today, we don't clear the shard info of the primary shard when a new node joins; then we might risk of making replica allocation decisions based on the stale information of the primary. The serious problem is that we can cancel the current recovery which is more advanced than the copy on the new node due to the old info we have from the primary. With this change, we ensure the shard info from the primary is not older than any node when allocating replicas. Relates #46959 This work was done by Henning in #42518. Co-authored-by: Henning Andersen <henning.andersen@elastic.co>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a first step away from sync-ids. We now check if replica and
primary are identical using sequence numbers when determining where to
allocate a replica shard.
If an index is no longer indexed into, issuing a regular flush will now
be enough to ensure a no-op recovery is done.
This has the nice side-effect of ensuring that closed indices and frozen
indices choose existing shard copies with identical data over
file-overlap comparison, increasing the chance that we end up doing a
no-op recovery (only no-op and file-based recovery is supported by
closed indices).
Relates #41400 and #33888
Supersedes #41784