Prevent invalid renewals of PRRLs#43898
Closed
DaveCTurner wants to merge 2 commits intoelastic:peer-recovery-retention-leasesfrom
Closed
Prevent invalid renewals of PRRLs#43898DaveCTurner wants to merge 2 commits intoelastic:peer-recovery-retention-leasesfrom
DaveCTurner wants to merge 2 commits intoelastic:peer-recovery-retention-leasesfrom
Conversation
Today when a PRRL is created during peer recovery it retains history from the sequence number provided by the recovering peer. However this sequence number may be greater than the primary's knowledge of that peer's persisted global checkpoint. Subsequent renewals of this lease will attempt to set the retained sequence number back to the primary's knowledge of that peer's persisted global checkpoint tripping an assertion that retention leases must only advance. This commit accounts for this. Caught by [a failure of `RecoveryWhileUnderLoadIT.testRecoverWhileRelocating`](https://scans.gradle.com/s/wxccfrtfgjj3g/console-log?task=:server:integTest#L14) Relates elastic#41536
Collaborator
|
Pinging @elastic/es-distributed |
10 tasks
Contributor
|
@DaveCTurner and I discussed this one after raising concerns that the change here could possibly hide future bugs. Instead, we want to know about cases where the assertion that retention leases must only advance trips, and then address those cases. For example, we noticed that we should probably remove leases for shards that are undergoing file-based recovery (as these recoveries are destructive in nature), which could be the cause for the linked test failure. |
Member
Author
|
Thanks both for your inputs. As Yannick says, we want more nuance here. I opened #43928 to address this particular case, so will close this. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Today when a PRRL is created during peer recovery it retains history from the
sequence number provided by the recovering peer. However this sequence number
may be greater than the primary's knowledge of that peer's persisted global
checkpoint. Subsequent renewals of this lease will attempt to set the retained
sequence number back to the primary's knowledge of that peer's persisted global
checkpoint tripping an assertion that retention leases must only advance. This
commit accounts for this.
Caught by a failure of
RecoveryWhileUnderLoadIT.testRecoverWhileRelocatingRelates #41536