allocator: select a good enough store for decom/recovery#86267
allocator: select a good enough store for decom/recovery#86267craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
2ccbb7f to
3c30e03
Compare
kvoli
left a comment
There was a problem hiding this comment.
Curious if you have run any benchmarks for the change - I think we should validate in in roachprod but otherwise LGTM.
Reviewed 3 of 3 files at r1, 6 of 7 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @AlexTalks, @kvoli, and @lidorcarmel)
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer_test.go line 230 at r2 (raw file):
allocRand := makeAllocatorRand(rand.NewSource(0)) for ii, tc := range testCases { t.Logf("ii=%d", ii)
What thist.LogF is for?
3c30e03 to
16d4269
Compare
lidorcarmel
left a comment
There was a problem hiding this comment.
yep, see the issue #86265
thanks!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @AlexTalks, and @kvoli)
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer_test.go line 230 at r2 (raw file):
Previously, kvoli (Austen) wrote…
What this
t.LogFis for?
oops, leftovers.
removed.
|
Nice results! |
There was a problem hiding this comment.
Had some comments about the setting name but this looks overall really good to me!
Reviewed 3 of 3 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @AlexTalks, @kvoli, and @lidorcarmel)
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 98 at r3 (raw file):
var enableRecoverToGoodEnoughStores = settings.RegisterBoolSetting( settings.SystemOnly, "kv.allocator.recover_to_good_enough_stores.enabled",
nit: does this setting (and name) make sense? perhaps we should have a setting that is something like:
kv.allocator.recovery_store_selector where the options are BEST, GOOD, ANY or something? I worry about the "good enough" terminology a bit, but if this makes sense to all I won't block on this of course.
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 897 at r3 (raw file):
// selectWorst randomly chooses one of the worst candidate stores from a sorted // (by score reversed) candidate list using the provided random generator. func (cl candidateList) selectWorst(randGen allocatorRand) *candidate {
FYI - just curious, what is this used for?
aayushshah15
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks, @kvoli, and @lidorcarmel)
pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 714 at r3 (raw file):
nil, nil, Dead, // Dead and Decommissioning should behave the same here
nit: can we randomly assign Dead or Decommissioning here to assert that they do indeed behave the same (just to prevent future regressions here).
In preparation for adding a new selection function for a good enough candidate, rename the existing "good" to "best". Release note: None
Until now, when decommissioning a node, or when recovering from a dead node, the allocator tries to pick one of the best possible stores as the target for the recovery. Because of that, we sometimes see multiple stores recover replicas to the same store, for example, when decommissioning a node and at the same time adding a new node. This PR changes the way we select a destination store by choosing a random store out of all the stores that are "good enough" for the replica. The risk diversity is still enforced, but we may recover a replica to a store that is considered "over full", for example. Note that during upreplication the allocator will still try to use one of the "best" stores as targets. Fixes: cockroachdb#86265 Release note: None Release justification: a relatively small change, and it can be reverted by setting kv.allocator.recovery_store_selector=best.
16d4269 to
3ce0fd5
Compare
lidorcarmel
left a comment
There was a problem hiding this comment.
np at all, thanks both! PTAL.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks and @kvoli)
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 98 at r3 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
nit: does this setting (and name) make sense? perhaps we should have a setting that is something like:
kv.allocator.recovery_store_selectorwhere the options areBEST, GOOD, ANYor something? I worry about the "good enough" terminology a bit, but if this makes sense to all I won't block on this of course.
Done.
I'm not worried about naming here because this setting should not be used (unless we really broke something), and I'm not sure we will have other strategies for selecting a destination store so I thought a bool is good enough.. but either way works (or, both names are good enough 😄 )
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 897 at r3 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
FYI - just curious, what is this used for?
when removing a replica we want to remove the worst.
pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 714 at r3 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
nit: can we randomly assign
DeadorDecommissioninghere to assert that they do indeed behave the same (just to prevent future regressions here).
Done (alternate the 2 instead of random).
aayushshah15
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks and @kvoli)
AlexTalks
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks and @kvoli)
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 98 at r3 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
Done.
I'm not worried about naming here because this setting should not be used (unless we really broke something), and I'm not sure we will have other strategies for selecting a destination store so I thought a bool is good enough.. but either way works (or, both names are good enough 😄 )
LGTM
pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 897 at r3 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
when removing a replica we want to remove the worst.
OK, got it!
lidorcarmel
left a comment
There was a problem hiding this comment.
Thanks all for the review!
bors r+
Reviewable status:
complete! 2 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks and @kvoli)
|
Build failed (retrying...): |
|
Build failed (retrying...): |
|
Build succeeded: |
Until now, when decommissioning a node, or when recovering from a dead
node, the allocator tries to pick one of the best possible stores as
the target for the recovery.
Because of that, we sometimes see multiple stores recover replicas
to the same store, for example, when decommissioning a node and
at the same time adding a new node.
This PR changes the way we select a destination store by choosing
a random store out of all the stores that are "good enough" for
the replica. The risk diversity is still enforced, but we may
recover a replica to a store that is considered "over full", for
example.
Note that during upreplication the allocator will still try to use
one of the "best" stores as targets.
Fixes: #86265
Release note: None
Release justification: a relatively small change, and it can be
reverted by setting kv.allocator.recovery_store_selector=best.