Skip to content

kvserver: prevent the StoreRebalancer from downreplicating a range#64170

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
aayushshah15:storeRebalancerBugFix64064
Apr 27, 2021
Merged

kvserver: prevent the StoreRebalancer from downreplicating a range#64170
craig[bot] merged 2 commits intocockroachdb:masterfrom
aayushshah15:storeRebalancerBugFix64064

Conversation

@aayushshah15
Copy link
Copy Markdown
Contributor

This PR contains 2 commits:

kvserver: add safeguard against spurious calls to AdminRelocateRange

This commit adds checks inside of AdminRelocateRange to fail if the
lists of relocation targets passed in by the caller contain duplicates.
This is supposed to act as a safeguard against catastrophic outcomes
like a range getting spuriously downreplicated due to malformed input.

Release note: None

kvserver: prevent the StoreRebalancer from downreplicating a range

Prior to this commit, we had a bug inside one of the methods used by the
StoreRebalancer where we had two variables referring to a slice that
was subsequently being appended to.

The code was making the implicit assumption that both of these slices
would point to the same underlying array, which is not true since any of
the additions mentioned above could result in the underlying array for
one of the slices being resized.

This commit fixes this bug.

Resolves #64064

Release note (bug fix): A bug in previous 21.1 betas allowed the store
rebalancer to spuriously downreplicate a range during normal operation.
This bug is now fixed.

@aayushshah15 aayushshah15 requested a review from nvb April 24, 2021 00:29
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@aayushshah15
Copy link
Copy Markdown
Contributor Author

I'm trying to figure out if I can write a sort of randomized test for the store rebalancer, but so far I've had a hard time thinking of anything that wouldn't be super invasive. If I cannot cleanly add a randomized test, I will add a more targeted unit test.

The PR should be ready for a look aside from this.

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15)


pkg/kv/kvserver/replica_command.go, line 3115 at r1 (raw file):

	for i := range targets {
		for j := i + 1; j < len(targets); j++ {
			if targets[i].StoreID == targets[j].StoreID && targets[i].NodeID == targets[j].NodeID {

Isn't this just targets[i] == targets[j]?


pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):

	targetType targetReplicaType,
) []roachpb.ReplicaDescriptor {
	var finalTargetsForType *[]roachpb.ReplicaDescriptor

Give this a comment that mentions why we need the indirection. It wasn't immediately clear to me that the idea was for the call to allocateTargetFromList in the loop down below to observe the append call below.

@aayushshah15 aayushshah15 force-pushed the storeRebalancerBugFix64064 branch from 1aaccf7 to abab419 Compare April 26, 2021 22:03
Copy link
Copy Markdown
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/replica_command.go, line 3115 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Isn't this just targets[i] == targets[j]?

Fixed, didn't realize I was just comparing ReplicationTargets and not ReplicaDescriptors


pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Give this a comment that mentions why we need the indirection. It wasn't immediately clear to me that the idea was for the call to allocateTargetFromList in the loop down below to observe the append call below.

Done, lmk if you think it is lacking.

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):

we want the result of subsequent calls

Isn't this "we want subsequent calls"?

Also, let's explicitly mention the append when we say "the result of previous calls", because allocateTargetFromList isn't what mutates the slice, it's the call to append below it.

@aayushshah15 aayushshah15 force-pushed the storeRebalancerBugFix64064 branch from abab419 to b8a0897 Compare April 27, 2021 00:59
This commit adds checks inside of `AdminRelocateRange` to fail if the
lists of relocation targets passed in by the caller contain duplicates.
This is supposed to act as a safeguard against catastrophic outcomes
like a range getting spuriously downreplicated due to malformed input.

Release note: None
Prior to this commit, we had a bug inside one of the methods used by the
`StoreRebalancer` where we had two variables referring to a slice that
was subsequently being appended to.

The code was making the implicit assumption that both of these slices
would point to the same underlying array, which is not true since any of
the additions mentioned above could result in the underlying array for
one of the slices being resized.

This commit fixes this bug.

Resolves cockroachdb#64064

Release note (bug fix): A bug in previous 21.1 betas allowed the store
rebalancer to spuriously downreplicate a range during normal operation.
This bug is now fixed.
@aayushshah15 aayushshah15 force-pushed the storeRebalancerBugFix64064 branch from b8a0897 to b00f923 Compare April 27, 2021 22:54
Copy link
Copy Markdown
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/kv/kvserver/store_rebalancer.go, line 756 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
we want the result of subsequent calls

Isn't this "we want subsequent calls"?

Also, let's explicitly mention the append when we say "the result of previous calls", because allocateTargetFromList isn't what mutates the slice, it's the call to append below it.

argh thanks for catching that. fixed and done.

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 27, 2021

Build succeeded:

@craig craig bot merged commit 3335e39 into cockroachdb:master Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kvserver: unxpected replication change from 3 to 2 voters

3 participants