Skip to content

mmaprototype: plumb node CPU in MakeStoreLoadMsg#164760

Merged
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
wenyihu6:fixnodelevel
Mar 5, 2026
Merged

mmaprototype: plumb node CPU in MakeStoreLoadMsg#164760
trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
wenyihu6:fixnodelevel

Conversation

@wenyihu6
Copy link
Copy Markdown
Contributor

@wenyihu6 wenyihu6 commented Mar 3, 2026

The capped multiplier model (computeCPUCapacityWithCap) subtracts
background CPU when deriving per-store capacity. This intentionally
breaks the invariant maintained by the naive model:

sum(store loads) / sum(store capacities) = NodeCPURateUsage / NodeCPURateCapacity

When background CPU exceeds the cap, node-level overload detection can
no longer recover physical utilization by summing store-level values.

This change replaces the derived NodeLoad fields (ReportedCPU,
CapacityCPU — incremental sums of per-store values) which relied
on this invariance with explicit physical node-level CPU values
(NodeCPULoad, NodeCPUCapacity) carried through StoreLoadMsg
from NodeCapacity fields.

Note that each store generates its descriptor independently, so
NodeCPULoad values may differ slightly across stores on the same
node due to sampling time differences. This is acceptable since
they should be roughly the same.

Epic: CRDB-55052
Release note: none

@wenyihu6 wenyihu6 requested review from a team as code owners March 3, 2026 20:05
@wenyihu6 wenyihu6 requested review from sumeerbhola and tbg March 3, 2026 20:05
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io bot commented Mar 3, 2026

😎 Merged successfully - details.

Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I'm a little sad about the extra noise in the datadriven files but great otherwise. Why is there no regression test? I figured since the old code was wrong, we should've been able to show an issue that then would've resolved with the fix? Did I miss it?

@tbg reviewed 25 files and all commit messages, and made 3 comments.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on sumeerbhola and wenyihu6).


pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 1395 at r1 (raw file):

	// so we incrementally adjust for the baseline shift and strip only this
	// store's pending delta.
	storePendingDelta := ss.adjusted.load[CPURate] - ss.reportedLoad[CPURate]

ok just re-expressing to make sure I'm getting it

the previous adjustedCPU is basically ns.NodeCPULoad + all_store_pending_deltas and the new one should be storeMsg.NodeCPULoad + all_but_our_store_pending_deltas. And we have

newAdjustedCPU :=
= oldAdjustedCPU +  storeMsg.NodeCPULoad - oldNodeCPULoad - storePendingDelta
= (oldAdjustedCPU - oldNodeCPULoad) - storePendingDelta + storeMsg.NodeCPULoad
= allPendingDelta - storePendingDelta + storeMsg.NodeCPULoad
= storeMsg.NodeCPULoad + all_but_our_store_pending_delta

and we add our pending delta back in below. Checks out.


pkg/kv/kvserver/allocator/mmaprototype/testdata/cluster_state/rebalance_stores_cpu_replica_refusing_target.txt line 28 at r1 (raw file):

store-load-msg
  store-id=1 node-id=1 load=[1000,0,0] capacity=[1000,1000,1000] secondary-load=0 load-time=0s node-cpu-load=1000 node-cpu-capacity=1000

These additions seem noisy and redundant except when we're actively introducing background load. Can we automatically compute them by default so that they need to be specified only when targeted directly?

The capped multiplier model (`computeCPUCapacityWithCap`) subtracts
background CPU when deriving per-store capacity. This intentionally
breaks the invariant maintained by the naive model:

  sum(store loads) / sum(store capacities) = NodeCPURateUsage / NodeCPURateCapacity

When background CPU exceeds the cap, node-level overload detection can
no longer recover physical utilization by summing store-level values.

This change replaces the derived `NodeLoad` fields (`ReportedCPU`,
`CapacityCPU` — incremental sums of per-store values) which relied
on this invariance with explicit physical node-level CPU values
(`NodeCPULoad`, `NodeCPUCapacity`) carried through `StoreLoadMsg`
from `NodeCapacity` fields.

Note that each store generates its descriptor independently, so
`NodeCPULoad` values may differ slightly across stores on the same
node due to sampling time differences. This is acceptable since
they should be roughly the same.
Copy link
Copy Markdown
Contributor Author

@wenyihu6 wenyihu6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenyihu6 made 2 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on sumeerbhola and tbg).


pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 1395 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

ok just re-expressing to make sure I'm getting it

the previous adjustedCPU is basically ns.NodeCPULoad + all_store_pending_deltas and the new one should be storeMsg.NodeCPULoad + all_but_our_store_pending_deltas. And we have

newAdjustedCPU :=
= oldAdjustedCPU +  storeMsg.NodeCPULoad - oldNodeCPULoad - storePendingDelta
= (oldAdjustedCPU - oldNodeCPULoad) - storePendingDelta + storeMsg.NodeCPULoad
= allPendingDelta - storePendingDelta + storeMsg.NodeCPULoad
= storeMsg.NodeCPULoad + all_but_our_store_pending_delta

and we add our pending delta back in below. Checks out.

Yes.


pkg/kv/kvserver/allocator/mmaprototype/testdata/cluster_state/rebalance_stores_cpu_replica_refusing_target.txt line 28 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

These additions seem noisy and redundant except when we're actively introducing background load. Can we automatically compute them by default so that they need to be specified only when targeted directly?

We didn't have regression tests for this invariance. Put up #164882 as a regression test.

@wenyihu6
Copy link
Copy Markdown
Contributor Author

wenyihu6 commented Mar 5, 2026

/trunk merge

@trunk-io trunk-io bot merged commit 06a4624 into cockroachdb:master Mar 5, 2026
37 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants