kvserver: prevent the StoreRebalancer from downreplicating a range#64170
Conversation
|
I'm trying to figure out if I can write a sort of randomized test for the store rebalancer, but so far I've had a hard time thinking of anything that wouldn't be super invasive. If I cannot cleanly add a randomized test, I will add a more targeted unit test. The PR should be ready for a look aside from this. |
nvb
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r1, 1 of 1 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15)
pkg/kv/kvserver/replica_command.go, line 3115 at r1 (raw file):
for i := range targets { for j := i + 1; j < len(targets); j++ { if targets[i].StoreID == targets[j].StoreID && targets[i].NodeID == targets[j].NodeID {
Isn't this just targets[i] == targets[j]?
pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):
targetType targetReplicaType, ) []roachpb.ReplicaDescriptor { var finalTargetsForType *[]roachpb.ReplicaDescriptor
Give this a comment that mentions why we need the indirection. It wasn't immediately clear to me that the idea was for the call to allocateTargetFromList in the loop down below to observe the append call below.
1aaccf7 to
abab419
Compare
aayushshah15
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/replica_command.go, line 3115 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Isn't this just
targets[i] == targets[j]?
Fixed, didn't realize I was just comparing ReplicationTargets and not ReplicaDescriptors
pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Give this a comment that mentions why we need the indirection. It wasn't immediately clear to me that the idea was for the call to
allocateTargetFromListin the loop down below to observe theappendcall below.
Done, lmk if you think it is lacking.
nvb
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r3.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):
we want the result of subsequent calls
Isn't this "we want subsequent calls"?
Also, let's explicitly mention the append when we say "the result of previous calls", because allocateTargetFromList isn't what mutates the slice, it's the call to append below it.
abab419 to
b8a0897
Compare
This commit adds checks inside of `AdminRelocateRange` to fail if the lists of relocation targets passed in by the caller contain duplicates. This is supposed to act as a safeguard against catastrophic outcomes like a range getting spuriously downreplicated due to malformed input. Release note: None
Prior to this commit, we had a bug inside one of the methods used by the `StoreRebalancer` where we had two variables referring to a slice that was subsequently being appended to. The code was making the implicit assumption that both of these slices would point to the same underlying array, which is not true since any of the additions mentioned above could result in the underlying array for one of the slices being resized. This commit fixes this bug. Resolves cockroachdb#64064 Release note (bug fix): A bug in previous 21.1 betas allowed the store rebalancer to spuriously downreplicate a range during normal operation. This bug is now fixed.
b8a0897 to
b00f923
Compare
aayushshah15
left a comment
There was a problem hiding this comment.
TFTR
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
we want the result of subsequent callsIsn't this "we want subsequent calls"?
Also, let's explicitly mention the append when we say "the result of previous calls", because
allocateTargetFromListisn't what mutates the slice, it's the call toappendbelow it.
argh thanks for catching that. fixed and done.
|
Build succeeded: |
This PR contains 2 commits:
kvserver: add safeguard against spurious calls to AdminRelocateRange
This commit adds checks inside of
AdminRelocateRangeto fail if thelists of relocation targets passed in by the caller contain duplicates.
This is supposed to act as a safeguard against catastrophic outcomes
like a range getting spuriously downreplicated due to malformed input.
Release note: None
kvserver: prevent the StoreRebalancer from downreplicating a range
Prior to this commit, we had a bug inside one of the methods used by the
StoreRebalancerwhere we had two variables referring to a slice thatwas subsequently being appended to.
The code was making the implicit assumption that both of these slices
would point to the same underlying array, which is not true since any of
the additions mentioned above could result in the underlying array for
one of the slices being resized.
This commit fixes this bug.
Resolves #64064
Release note (bug fix): A bug in previous 21.1 betas allowed the store
rebalancer to spuriously downreplicate a range during normal operation.
This bug is now fixed.