storage: more aggressively replica GC learner replicas#41300
Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom Oct 4, 2019
Merged
Conversation
Member
tbg
approved these changes
Oct 4, 2019
Member
tbg
left a comment
There was a problem hiding this comment.
and thanks for looking into this!
Reviewed 4 of 4 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/storage/replica_gc_queue.go, line 39 at r1 (raw file):
// A Replica is suspected to have been removed if either it is in the // candidate Raft state (which is a typical sign of having been removed // from the group) or it is a leaner which has been removed but never heard
learner
This PR fixes a test flake in TestSystemZoneConfig:
```
client_replica_test.go:1753: condition failed to evaluate within 45s: mismatch between
r1:/{Min-System/NodeLiveness} [(n1,s1):1, (n6,s6):2, (n4,s4):3, (n2,s2):7, (n7,s7):5, next=8, gen=14]
r1:/{Min-System/NodeLiveness} [(n1,s1):1, (n6,s6):2, (n4,s4):3, (n2,s2):4, (n7,s7):5, (n3,s3):6LEARNER, next=7, gen=9]
```
The above flake happens because we set the expectation in the map to a
descriptor which contains a learner which has since been removed.
We shouldn't use a range descriptor which contains learners as the expectation.
To avoid that we return an error in the succeeds soon if we come across a
descriptor which contains learners. This behavior unvealed another issue,
we are way too conservative with replica GC for learners. Most of the time
when learners are removed they hear about their own removal, but if they don't
we won't consider the Replica for removal for 10 days! This commit changes
the replica gc queue behavior to treat learners line candidates.
Fixes cockroachdb#40980.
Release Justification: bug fixes and low-risk updates to new functionality.
Release note: None
7376180 to
2e9ce1c
Compare
ajwerner
commented
Oct 4, 2019
Contributor
Author
ajwerner
left a comment
There was a problem hiding this comment.
TFTR!
bors r=tbg
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)
pkg/storage/replica_gc_queue.go, line 39 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
learner
Done.
Contributor
Build failed |
Contributor
Author
|
Flaked on TestComposeGSS. bors r+ |
Contributor
Build failed |
Contributor
Author
|
bors r+ |
craig bot
pushed a commit
that referenced
this pull request
Oct 4, 2019
41300: storage: more aggressively replica GC learner replicas r=ajwerner a=ajwerner
This PR fixes a test flake in TestSystemZoneConfig:
```
client_replica_test.go:1753: condition failed to evaluate within 45s: mismatch between
r1:/{Min-System/NodeLiveness} [(n1,s1):1, (n6,s6):2, (n4,s4):3, (n2,s2):7, (n7,s7):5, next=8, gen=14]
r1:/{Min-System/NodeLiveness} [(n1,s1):1, (n6,s6):2, (n4,s4):3, (n2,s2):4, (n7,s7):5, (n3,s3):6LEARNER, next=7, gen=9]
```
The above flake happens because we set the expectation in the map to a
descriptor which contains a learner which has since been removed.
We shouldn't use a range descriptor which contains learners as the expectation.
To avoid that we return an error in the succeeds soon if we come across a
descriptor which contains learners. This behavior unvealed another issue,
we are way too conservative with replica GC for learners. Most of the time
when learners are removed they hear about their own removal, but if they don't
we won't consider the Replica for removal for 10 days! This commit changes
the replica gc queue behavior to treat learners line candidates.
Fixes #40980.
Release Justification: bug fixes and low-risk updates to new functionality.
Release note: None
41308: storage: remove error from Replica.applyTimestampCache() r=ajwerner a=ajwerner
Stumbled upon a function with an error in its return signature
that never returns an error. Better to remove it and the stale
comment that goes with it. The removal of the code paths which
could have returned an error occurred in #33396.
Release justification: Low risk, does not change logic. Could also
hold off.
Release note: None
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Contributor
Build succeeded |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes a test flake in TestSystemZoneConfig:
The above flake happens because we set the expectation in the map to a
descriptor which contains a learner which has since been removed.
We shouldn't use a range descriptor which contains learners as the expectation.
To avoid that we return an error in the succeeds soon if we come across a
descriptor which contains learners. This behavior unvealed another issue,
we are way too conservative with replica GC for learners. Most of the time
when learners are removed they hear about their own removal, but if they don't
we won't consider the Replica for removal for 10 days! This commit changes
the replica gc queue behavior to treat learners line candidates.
Fixes #40980.
Release Justification: bug fixes and low-risk updates to new functionality.
Release note: None