storage: Load-based rebalancing should be more willing to make certain incremental improvements

This is already called out in a TODO in the code:

https://github.com/cockroachdb/cockroach/blob/4746982e1e525a7a60c4468c36a3e4aacc0ed508/pkg/storage/store_rebalancer.go#L567-L571

But yesterday I saw a cluster get into a real situation where a greater willingness to move a replica that would make another store overfull would have been the ideal thing to do / what a human would have done. Currently the code is cautious about moving replicas with so many QPS that moving them over-fills whichever store they get moved to. That's a good decision when there's only one such replica on a store, but when there's two it's less good:

```
I181009 02:36:43.678116 996 storage/store_rebalancer.go:686  [n27,s28,store-rebalancer] r216919's 2456.46 qps would push s13 over the mean (168.26) with 2531.33 qps afterwards
I181009 02:36:43.678126 996 storage/store_rebalancer.go:597  [n27,s28,store-rebalancer] couldn't find enough rebalance targets for r216919 (2/3)
I181009 02:36:43.678723 996 storage/store_rebalancer.go:680  [n27,s28,store-rebalancer] r216912's 2338.62 qps would push s49 over the max threshold (268.26) with 2353.85 qps afterwards
I181009 02:36:43.678730 996 storage/store_rebalancer.go:597  [n27,s28,store-rebalancer] couldn't find enough rebalance targets for r216912 (2/3)
I181009 02:36:43.678738 996 storage/store_rebalancer.go:307  [n27,s28,store-rebalancer] ran out of replicas worth transferring and qps (4953.34) is still above desired threshold (268.26); will check again soon
```

In these logs, s28 has way more qps than the mean, mostly accounted for by two hot replicas (IIRC they were hot due to a bunch of retried errors, but that's a separate concern). However, moving either of them fails the check in `shouldNotMoveTo` that attempts to avoid ping-ponging a lone hot lease around between stores on an otherwise lightly loaded cluster:

https://github.com/cockroachdb/cockroach/blob/4746982e1e525a7a60c4468c36a3e4aacc0ed508/pkg/storage/store_rebalancer.go#L653-L666

In such cases it would be reasonable to make such a move if, for example, the source store would still have more QPS on it than the destination store.

It's worth noting that it's better for 2.1 to be overly cautious than for it to be overly aggressive, so I'm not planning on pushing for any such changes to be backported.

gz#8871

	// TODO(a-robinson): Support more incremental improvements -- move what we
	// can if it makes things better even if it isn't great. For example,
	// moving one of the other existing replicas that's on a store with less
	// qps than the max threshold but above the mean would help in certain
	// locality configurations.

	newCandidateQPS := storeDesc.Capacity.QueriesPerSecond + replWithStats.qps
	if storeDesc.Capacity.QueriesPerSecond < minQPS {
	if newCandidateQPS > maxQPS {
	log.VEventf(ctx, 3,
	"r%d's %.2f qps would push s%d over the max threshold (%.2f) with %.2f qps afterwards",
	replWithStats.repl.RangeID, replWithStats.qps, candidateStore, maxQPS, newCandidateQPS)
	return true
	}
	} else if newCandidateQPS > meanQPS {
	log.VEventf(ctx, 3,
	"r%d's %.2f qps would push s%d over the mean (%.2f) with %.2f qps afterwards",
	replWithStats.repl.RangeID, replWithStats.qps, candidateStore, meanQPS, newCandidateQPS)
	return true
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: Load-based rebalancing should be more willing to make certain incremental improvements #31135

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

storage: Load-based rebalancing should be more willing to make certain incremental improvements #31135

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions