Use peer recovery retention leases for indices without soft-deletes#50351
Merged
dnhatn merged 11 commits intoelastic:masterfrom Dec 20, 2019
Merged
Use peer recovery retention leases for indices without soft-deletes#50351dnhatn merged 11 commits intoelastic:masterfrom
dnhatn merged 11 commits intoelastic:masterfrom
Conversation
This reverts commit 781a4b825a4d429cb07aeb8b51975e6b8573a8a9.
Collaborator
|
Pinging @elastic/es-distributed (:Distributed/Recovery) |
ywelsch
reviewed
Dec 19, 2019
Contributor
ywelsch
left a comment
There was a problem hiding this comment.
This is looking good. I've left one question about strengthening the assertions in the tests.
| } | ||
| assertNotNull(retentionLeases); | ||
| for (Map<String, ?> retentionLease : retentionLeases) { | ||
| if (((String) retentionLease.get("id")).startsWith("peer_recovery/")) { |
Contributor
There was a problem hiding this comment.
this does not require that there is always a peer recovery retention lease. Should we require finding such a lease, and for the right node?
ywelsch
approved these changes
Dec 19, 2019
| } | ||
| flush(index, true); | ||
| ensurePeerRecoveryRetentionLeasesRenewedAndSynced(index); | ||
| ensurePeerRecoveryRetentionLeasesRenewedAndSynced(index, false); |
Contributor
There was a problem hiding this comment.
Should we set alwaysExists to true if minimumNodeVersion() is on or after 7.6 (after backport) (here as well as in other places)?
Member
Author
|
Thanks Yannick! |
dnhatn
added a commit
that referenced
this pull request
Dec 24, 2019
…50351) Today, the replica allocator uses peer recovery retention leases to select the best-matched copies when allocating replicas of indices with soft-deletes. We can employ this mechanism for indices without soft-deletes because the retaining sequence number of a PRRL is the persisted global checkpoint (plus one) of that copy. If the primary and replica have the same retaining sequence number, then we should be able to perform a noop recovery. The reason is that we must be retaining translog up to the local checkpoint of the safe commit, which is at most the global checkpoint of either copy). The only limitation is that we might not cancel ongoing file-based recoveries with PRRLs for noop recoveries. We can't make the translog retention policy comply with PRRLs. We also have this problem with soft-deletes if a PRRL is about to expire. Relates #45136 Relates #46959
dnhatn
added a commit
that referenced
this pull request
Dec 24, 2019
dnhatn
added a commit
that referenced
this pull request
Dec 26, 2019
dnhatn
added a commit
that referenced
this pull request
Dec 26, 2019
SivagurunathanV
pushed a commit
to SivagurunathanV/elasticsearch
that referenced
this pull request
Jan 23, 2020
…lastic#50351) Today, the replica allocator uses peer recovery retention leases to select the best-matched copies when allocating replicas of indices with soft-deletes. We can employ this mechanism for indices without soft-deletes because the retaining sequence number of a PRRL is the persisted global checkpoint (plus one) of that copy. If the primary and replica have the same retaining sequence number, then we should be able to perform a noop recovery. The reason is that we must be retaining translog up to the local checkpoint of the safe commit, which is at most the global checkpoint of either copy). The only limitation is that we might not cancel ongoing file-based recoveries with PRRLs for noop recoveries. We can't make the translog retention policy comply with PRRLs. We also have this problem with soft-deletes if a PRRL is about to expire. Relates elastic#45136 Relates elastic#46959
SivagurunathanV
pushed a commit
to SivagurunathanV/elasticsearch
that referenced
this pull request
Jan 23, 2020
SivagurunathanV
pushed a commit
to SivagurunathanV/elasticsearch
that referenced
this pull request
Jan 23, 2020
testCancelRecoveryDuringPhase1 uses a mock of IndexShard, which can't create retention leases. We need to stub method createRetentionLease. Relates elastic#50351 Closes elastic#50424
SivagurunathanV
pushed a commit
to SivagurunathanV/elasticsearch
that referenced
this pull request
Jan 23, 2020
…astic#50486) We forgot to establish peer recovery retention leases for relocating primaries without soft-deletes. Relates elastic#50351
This was referenced Feb 3, 2020
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Today, the replica allocator uses peer recovery retention leases to select the best-matched copies when allocating replicas of indices with soft-deletes. We can employ this mechanism for indices without soft-deletes because the retaining sequence number of a PRRL is the persisted global checkpoint (plus one) of that copy. If the primary and replica have the same retaining sequence number, then we should be able to perform a noop recovery. The reason is that we must be retaining translog up to the local checkpoint of the safe commit, which is at most the global checkpoint of either copy). The only limitation is that we might not cancel ongoing file-based recoveries with PRRLs for noop recoveries. We can't make the translog retention policy comply with PRRLs. We also have this problem with soft-deletes if a PRRL is about to expire.
A nice side-effect of this is that we can turn off the translog retention once all shards started. However, I prefer leaving translog disconnect to PRRLs.
Relates #45136
Relates #46959