kvserver: get rid of bespoke relocation logic used by the mergeQueue#62631
Conversation
|
I'm going to add a randomized test (that just issues a bunch of random calls to |
Release note: None
3ea0d8d to
d0be2f0
Compare
nvb
left a comment
There was a problem hiding this comment.
Very satisfying!
Reviewed 1 of 1 files at r1, 4 of 4 files at r2, 4 of 4 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)
pkg/kv/kvserver/allocator.go, line 193 at r2 (raw file):
// targetReplicaType.{Add,Remove}ChangeType methods wherever possible. func (t targetReplicaType) AddChangeType() roachpb.ReplicaChangeType { switch typ := t; typ {
Is there a reason for the value assignment in these switch statements? Would switch t { accomplish the same thing with less code and less room for confusion?
pkg/kv/kvserver/allocator.go, line 205 at r2 (raw file):
// RemoveChangeType returns the roachpb.ReplicaChangeType corresponding to the // given targetReplicaType. func (t targetReplicaType) RemoveChangeType() roachpb.ReplicaChangeType {
Can we use this in replicateQueue.removeDead?
pkg/kv/kvserver/client_merge_test.go, line 21 at r2 (raw file):
"reflect" "regexp" "sort"
I'm surprised to see an import change with no other code changes. Does this commit compile?
pkg/kv/kvserver/replica_command.go, line 2469 at r2 (raw file):
} // GetTargetsToCollocateRHSForMerge decides the configuration of RHS replicas
💯
pkg/kv/kvserver/replica_command.go, line 2875 at r2 (raw file):
StoreID: targetStore.StoreID, } if args.targetType == voterTarget {
Was there a reason to separate this block out from the next? Do you find it easier to read?
If we merged them, we could avoid scanning through the replicas twice, and could instead of a single iteration through existingNonVoters.
pkg/kv/kvserver/replica_command.go, line 3001 at r2 (raw file):
// (Note that !shouldRemove implies that we're trying to remove the last // replica left in the descriptor which is illegal). ops = append(ops, roachpb.MakeReplicationChanges(args.targetType.RemoveChangeType(), removalTarget)...)
Why the append here?
pkg/kv/kvserver/replicate_queue.go, line 1130 at r2 (raw file):
return false, err } if performingSwap {
nit: consider pulling this below rq.metrics.RebalanceReplicaCount.Inc(1) so that the non-conditional metric comes first and the conditional metrics come after.
pkg/kv/kvserver/replicate_queue.go, line 1221 at r2 (raw file):
} log.VEventf(ctx, 1, "can't swap replica due to lease; falling back to add") return chgs, performingSwap, err
nit: we generally prefer returning a constant instead of a named return variable when the value is known. It makes things easier to read. So this would be return chgs, false, err.
d0be2f0 to
9828460
Compare
aayushshah15
left a comment
There was a problem hiding this comment.
I cleaned some things up and added a bunch more testing -- all in the second commit. Should be RFAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)
pkg/kv/kvserver/allocator.go, line 193 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is there a reason for the value assignment in these switch statements? Would
switch t {accomplish the same thing with less code and less room for confusion?
No reason. Fixed now.
pkg/kv/kvserver/allocator.go, line 205 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Can we use this in
replicateQueue.removeDead?
Done there and in removeDecommissioning.
pkg/kv/kvserver/client_merge_test.go, line 21 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm surprised to see an import change with no other code changes. Does this commit compile?
Messed up while re-organizing my changes. Fixed now.
pkg/kv/kvserver/replica_command.go, line 2875 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Was there a reason to separate this block out from the next? Do you find it easier to read?
If we merged them, we could avoid scanning through the replicas twice, and could instead of a single iteration through
existingNonVoters.
That wasn't intentional, fixed.
pkg/kv/kvserver/replica_command.go, line 3001 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Why the
appendhere?
Fixed.
pkg/kv/kvserver/replicate_queue.go, line 1130 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: consider pulling this below
rq.metrics.RebalanceReplicaCount.Inc(1)so that the non-conditional metric comes first and the conditional metrics come after.
Done.
pkg/kv/kvserver/replicate_queue.go, line 1221 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: we generally prefer returning a constant instead of a named return variable when the value is known. It makes things easier to read. So this would be
return chgs, false, err.
Done.
3b59a24 to
b4aeeda
Compare
|
The new randomized test I added is flaking because it also asserts that |
94f2dba to
74b6ec9
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewed 6 of 7 files at r4, 7 of 7 files at r5.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)
pkg/kv/kvserver/replica_command.go, line 2875 at r2 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
That wasn't intentional, fixed.
We're still searching for the non-voter twice, once with GetReplicaDescriptor and once with an iteration over existingNonVoters. Should we avoid that?
pkg/kv/kvserver/replica_command.go, line 2957 at r5 (raw file):
} } else if shouldAdd { ops = roachpb.MakeReplicationChanges(args.targetType.AddChangeType(), additionTarget)
It's probably better to avoid setting ops if we're just going to immediately overwrite it on the canPromoteNonVoter/canDemoteVoter paths. This is currently a bit misleading and looks error-prone.
74b6ec9 to
db73198
Compare
aayushshah15
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @nvanbenschoten)
pkg/kv/kvserver/replica_command.go, line 2875 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We're still searching for the non-voter twice, once with
GetReplicaDescriptorand once with an iteration overexistingNonVoters. Should we avoid that?
Ugh, sorry for being careless!
Fixed for real now.
pkg/kv/kvserver/replica_command.go, line 2957 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It's probably better to avoid setting
opsif we're just going to immediately overwrite it on thecanPromoteNonVoter/canDemoteVoterpaths. This is currently a bit misleading and looks error-prone.
Done (this is what you meant, right?).
nvb
left a comment
There was a problem hiding this comment.
Reviewed 5 of 5 files at r7.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15 and @andreimatei)
pkg/kv/kvserver/replica_command.go, line 2957 at r5 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
Done (this is what you meant, right?).
Exactly.
pkg/kv/kvserver/replica_command.go, line 2847 at r7 (raw file):
} } if promotionTargetIdx >= 0 {
Do we need promotionTargetIdx? Can we just move this condition into the loop?
voters to non-voters This commit updates `AdminRelocateRange` to use explicit atomic swaps of voting replicas with non-voting replicas, that cockroachdb#58627 initially added support for. The patch does so by generalizing behavior that's already exercised by the `replicateQueue` when it decides to rebalance replicas. See cockroachdb#61239. This allows us, in the next commit, to remove bespoke relocation logic that's used by the `mergeQueue` to align replica sets for the sake of a range merge. Release note: None
This commit removes the relocation logic used by the `mergeQueue` thus far to align replica sets (added in cockroachdb#56197). This logic previously existed in order to allow us to align the replica sets of a pair of ranges (which is required for the range merge to proceed), while avoiding redundant data movement. Before cockroachdb#58627 and the previous commit in this PR, `AdminRelocateRange` couldn't be directly used by the mergeQueue under various configurations of the LHS and RHS ranges. Furthermore, even when it could be used, it would involve redundant data movement. This all required us to compute relocation targets for the RHS of a merge separately, above the call to `AdminRelocateRange`, for the range merge to proceed. All these limitations have been resolved by the previous commit which teaches `AdminRelocateRange` to promote non-voters and demote voters when needed, and the aforementioned bespoke relocation logic is no longer needed. Release note: None
db73198 to
d7a7a49
Compare
aayushshah15
left a comment
There was a problem hiding this comment.
Thanks for being patient with me for these last few comments, and thanks for the review.
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @nvanbenschoten)
pkg/kv/kvserver/replica_command.go, line 2847 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we need
promotionTargetIdx? Can we just move this condition into the loop?
Moved it into the loop.
nvb
left a comment
There was a problem hiding this comment.
Reviewed 5 of 5 files at r9.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei)
|
Build succeeded: |
This PR contains two main commits:
kvserver: update
AdminRelocateRangeto leverage explicit swaps ofvoters to non-voters
This commit updates
AdminRelocateRangeto use explicit atomic swaps ofvoting replicas with non-voting replicas, that #58627 initially added
support for. The patch does so by generalizing behavior that's already
exercised by the
replicateQueuewhen it decides to rebalance replicas.See #61239.
This allows us, in the next commit, to remove bespoke relocation logic
that's used by the
mergeQueueto align replica sets for the sake of arange merge.
Release note: None
kvserver: get rid of bespoke relocation logic used by the mergeQueue
This commit removes the relocation logic used by the
mergeQueuethusfar to align replica sets (added in #56197). This logic previously
existed in order to allow us to align the replica sets of a pair of
ranges (which is required for the range merge to proceed), while
avoiding redundant data movement.
Before #58627 and the previous commit in this PR,
AdminRelocateRangecouldn't be directly used by the mergeQueue under various configurations
of the LHS and RHS ranges. Furthermore, even when it could be used, it
would involve redundant data movement. This all required us to compute
relocation targets for the RHS of a merge separately, above the call to
AdminRelocateRange, for the range merge to proceed.All these limitations have been resolved by the previous commit which
teaches
AdminRelocateRangeto promote non-voters and demote voterswhen needed, and the aforementioned bespoke relocation logic is no
longer needed.
Resolves #62370
Release note: None