-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv: BatchRequest count used as QPS metric for rebalancing, not Request count #50620
Description
In a recent customer report, we found that load-based rebalancing was failing to properly balance leaseholders, resulting in imbalanced load. After some deep investigation, @yuzefovich and @tbg found that the problem could be traced back to lookup joins. Specifically, one of the nodes was being tasked with running all of the lookup joins in the workload. The question is, why wasn't this picked up by either the hotranges report or load-based balancing?
After an in-person discussion, we believe this is because both of these sources rely on the leaseholderStats on each replica. These statistics only track the number of BatchRequests evaluated on a leaseholder, and not the number of individual requests:
cockroach/pkg/kv/kvserver/replica_send.go
Lines 54 to 56 in 96db1b3
| if r.leaseholderStats != nil && ba.Header.GatewayNodeID != 0 { | |
| r.leaseholderStats.record(ba.Header.GatewayNodeID) | |
| } |
The hypothesis is that if this line was changed to r.leaseholderStats.recordCount(len(ba.Requests), ba.Header.GatewayNodeID), load-based lease rebalancing would have avoided this issue. This is because (in v19.2) lookup joins issue batches of 100 scans. So these batches would only be counted once towards a range's qps for load balancing purposes, but would place 100x the load on the leaseholder evaluating them.
We should test that hypothesis.
We should also determine whether we actually want to make a change here. It's problematic that we don't consider the size of batch requests in these heuristics. However, changing that now could lead to surprising effects in other areas. For instance, it would also weigh follow-the-workload rebalancing in favor of gateways that issue multi-request batches. Do we want that?
Jira issue: CRDB-4109