Skip to content

allocator/mmaprototype: implement MMA repair pipeline#164658

Closed
tbg wants to merge 69 commits intocockroachdb:masterfrom
tbg:mmra
Closed

allocator/mmaprototype: implement MMA repair pipeline#164658
tbg wants to merge 69 commits intocockroachdb:masterfrom
tbg:mmra

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Mar 2, 2026

Important

Start here: walkthrough.md — a linear, code-level walkthrough of the repair pipeline. Uses repairAddVoter as a representative example and traces the full data flow from mode gating through change application. Includes inline source snippets (verifiable via showboat verify).

Architecture & design brief: design-brief.md — how the repair pipeline works and how it fits into MMA.

Productionization plan: mma-repair-brief.md — proposed merge strategy (6 PRs) and what's in/out of scope.

Detailed PR split: how-to-productionize.md — per-PR breakdown with files, LOC estimates, and review focus areas.

To discuss:

  • how to implement count-based (lease and replica) rebalancing (which is done by the lease/replicate queues). In particular, count-based rebalancing in the SMA does not consider nodes "overfull" based on global means (which can be non-representative).
  • quorum check should come at top of computeRepairAction
  • metric gauges exposing count of replicas by repair enum status
  • whether and how to add IO overload shedding
  • prioritization mechanism between repair action: current "greedy by enum order" maybe not ideal
  • avoid busy-looping (consider enforcing some spacing) between mma iterations (or maybe not needed, but replicate/lease queue do it)
  • AdminScatter
  • TransferLease
  • metamorphic testing in unit tests

Summary

This PR implements the complete repair pipeline for the Multi-Metric
Allocator (MMA) prototype. Today, range repair (upreplication,
decommissioning, dead-node replacement, constraint enforcement) is
handled by the replicate queue. This PR teaches MMA to perform all of
these repairs itself, behind a new cluster setting mode
(LBRebalancingMultiMetricRepairAndRebalance). When that mode is active,
the replicate and lease queues become no-ops and MMA takes full
ownership of both rebalancing and repair.

Design & planning

The branch starts with design documents that map out the repair action
space and the constraint satisfaction logic, including how the legacy
allocator's scorer hierarchy maps onto MMA concepts.

Repair actions (12 total)

All twelve repair actions are implemented in
cluster_state_repair.go (~1,600 lines), organized into three groups:

  • Count-based: AddVoter (with non-voter promotion), RemoveVoter,
    AddNonVoter, RemoveNonVoter, RemoveLearner,
    FinalizeAtomicReplicationChange.
  • Dead/decommissioning replacement: ReplaceDeadVoter,
    ReplaceDeadNonVoter, ReplaceDecommissioningVoter,
    ReplaceDecommissioningNonVoter.
  • Constraint swaps: SwapVoterForConstraints,
    SwapNonVoterForConstraints.

Each action is tested via the MMA DSL (repair, repair-needed
commands) with datadriven test files covering count-based, constraint,
and interaction scenarios.

Diversity picker

pickStoreByDiversity is generalized to accept a diversityScorer
function parameter, enabling reuse across add, remove, replace, and swap
paths. Random tiebreaking ensures non-deterministic selection among
equally-good candidates.

Integration

A new LBRebalancingMultiMetricRepairAndRebalance mode is added to the
LoadBasedRebalancingMode cluster setting. When active:

  • The replicate queue's shouldQueue/process return immediately.
  • The lease queue's shouldQueue/process return immediately.
  • IncludeRepair is passed to ComputeChanges, which calls repair()
    before computing rebalancing changes.
  • Repair metrics are tracked separately from rebalance metrics.

Production fixes

  • Leave-joint handling: IsLeaveJoint() on ExternalRangeChange
    routes leave-joint changes through maybeLeaveAtomicChangeReplicas
    instead of changeReplicasImpl, which would fail.
  • ForceReplicationScanAndProcess: now delegates to
    mmaStoreRebalancer.rebalanceUntilStable() when MMA
    repair-and-rebalance mode is active, fixing WaitForFullReplication
    and enabling ReplicationAuto in MMA tests.
  • TestMMAUpreplication: integration test verifying end-to-end
    upreplication from 1 to 3 voters under MMA.

ASIM tests

The allocator simulator (ASIM) is extended to support MMA repair:
queue disabling under the new mode, same-store voter type transitions in
ReplicaChange.Apply, and golden-output repair test scenarios covering
upreplication, dead-node replacement, decommissioning, and constraint
enforcement.

Commits

Individual commits speak for themselves. They are grouped into phases
visible in the commit log: design docs, infrastructure, count-based
repairs, replacement repairs, constraint swaps, refactoring, integration,
ASIM tests, and production fixes.

Epic: CRDB-39508

@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io bot commented Mar 2, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Mar 2, 2026

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg tbg changed the title allocator/mmaprototype: implement repair actions (AddVoter, RemoveVoter) allocator/mmaprototype: implement all repair actions Mar 3, 2026
tbg and others added 25 commits March 3, 2026 11:59
Add a reference document mapping the responsibilities of the replicate queue,
lease queue, store rebalancer, and MMA. This serves as the foundation for
planning the absorption of the replicate and lease queues into MMA.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Document the lexicographic priority hierarchy used by the legacy
allocator's candidate scoring (candidate.compare()), the different
scorer variants, and how MMA's approach differs.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
MMA checks constraints but won't actively fix violations — it skips
non-conformant ranges. Update the table to reflect this.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Document the approach for teaching MMA to detect and repair constraint
violations: eager constraint analysis at StoreLeaseholderMsg processing
time with caching, and a repair set processed with higher priority than
load rebalancing.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Replace the TODOs in mma-constraint-repair.md with resolved design
decisions:

- Repair action ordering: plain enum replacing legacy numerical
  priorities, with simplified Remove actions and explicit constraint
  swap actions.
- Repair operations: reuse existing MMA add/remove/replace primitives.
- Pending change interaction: skip ranges with in-flight changes.
- Store health/disposition interaction: targets require
  ReplicaDispositionOK; removal prefers Dead > Unknown > Unhealthy >
  Shedding > Refusing > healthy.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Three improvements to the cluster state test DSL to reduce boilerplate in
upcoming repair tests:

1. `set-store quiet=true`: suppress the verbose node/store listing output.
2. Auto-assign replica IDs when `replica-id=` is omitted from replica lines
   in `store-leaseholder-msg`. A per-range counter starts at 1 and advances
   with each replica; explicit values update the counter to stay above them.
3. Relax the minimum field count for replica lines from 3 to 2 (store-id +
   type is sufficient when replica-id and leaseholder are omitted).

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Add a `repair` method on `rebalanceEnv` that currently logs "not yet
implemented" and returns nil. This will be filled in as the constraint
repair logic is built out.

Add a corresponding `repair` command to the TestClusterState datadriven
DSL, following the same pattern as `rebalance-stores`: it creates a
rebalanceEnv, calls repair, and outputs the trace plus pending changes.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Add 10 datadriven test files exercising count-based repair scenarios.
Each test sets up stores and ranges with specific replica configurations,
then invokes `repair` and expects the stub "not yet implemented" output.
These tests will be rewritten with `-rewrite` as the repair logic is
implemented.

Tests cover: finalizing atomic replication changes, removing learners,
adding/removing voters and non-voters, and replacing dead or
decommissioning replicas.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Add 4 datadriven test files for constraint-based and interaction repair
scenarios:

- repair_swap_voter: voter misplaced relative to zone constraints
- repair_swap_nonvoter: non-voter misplaced relative to zone constraints
- repair_pending_skip: range with existing pending change is skipped
- repair_range_unavailable: range without quorum (2 of 3 voters dead)

Like the count-based tests, these currently expect the stub output and
will be rewritten as repair logic is implemented.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Add the RepairAction enum and its core computation method. RepairAction
represents the highest-priority repair needed for a range, ordered so
that lower enum values have higher priority. The zero value is
intentionally invalid (iota + 1) to catch uninitialized fields.

computeRepairAction inspects a range's replicas, store statuses, and
constraint satisfaction to determine the single highest-priority action:
joint config finalization, learner removal, voter/non-voter count
adjustments (add, remove, replace dead/decommissioning), and constraint
swaps. Ranges that have lost quorum or have pending changes return
NoRepairNeeded (can't repair or already being repaired).

Also adds updateRepairAction and removeFromRepairRanges helpers that
maintain the repairRanges index (wired in the next commit).

Epic: none
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Add a repairAction field to rangeState and a repairRanges index to
clusterState that maps RepairAction → set of range IDs. This allows
repair() to iterate ranges needing repair by priority without scanning
all ranges.

Wire updateRepairAction calls at all trigger points where a range's
repair status may change:
- processRangeMsg (replicas or config changed)
- updateStoreStatuses (store health changed)
- addPendingRangeChange (pending change suppresses repair)
- undoPendingChange (pending change removed)
- pendingChangeEnacted (pending change completed)
- range GC (range removed from tracking)

The pendingChangeEnacted signature gains a context.Context parameter
to support the updateRepairAction call, updated at all 3 call sites.

Epic: none
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
…king tests

Add the `repair-needed` DSL command to TestClusterState, which dumps the
repairRanges index in priority order, showing which ranges need which
repair action.

Update all 14 existing repair test files to assert repair tracking after
each state mutation. Add 6 new datadriven tests exercising the tracking
lifecycle:
- repair_tracking_status_change: store health transitions
- repair_tracking_pending_lifecycle: pending change add/reject/enact
- repair_tracking_config_change_with_pending: config change during pending
- repair_tracking_multi_range: multiple ranges with different actions
- repair_tracking_constraint_change: constraint satisfaction changes
- repair_tracking_action_priority: action priority transitions

Epic: none
Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Move `candidatesToConvertFromNonVoterToVoter()` and
`constraintsForAddingVoter()` from `constraint_unused_test.go` to
`constraint.go`. These methods are needed for the upcoming AddVoter
repair action implementation.

Also add `originMMARepair` to `ChangeOrigin` for tracking repair-originated
changes separately from rebalance-originated ones.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Implement the `repair()` dispatch loop in `rebalanceEnv` and the first
concrete repair action: `AddVoter`. When a range has fewer voters than
its span config requires, repair selects a target store based on
constraint satisfaction and diversity scoring, then creates a pending
change to add a voter there.

The repair loop iterates `repairRanges` in priority order (matching
`RepairAction` enum ordering) and only repairs ranges where the local
store is the leaseholder. Unimplemented actions log a message identifying
the specific action.

The `repairAddVoter` flow:
1. Analyze constraints for the range
2. Check for non-voter promotion candidates (TODO: implement promotion)
3. Find constraint-satisfying candidate stores
4. Filter by disposition, existing replicas, and node-level diversity
5. Pick the target with the best voter diversity score
6. Create and register the pending change

The `repair_add_voter.txt` test is extended to verify the full lifecycle:
repair creates a pending change, `repair-needed` confirms suppression,
and after enactment via `store-leaseholder-msg` the range is healthy.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Start with a single voter (config says 3) so repair must add voters in
two successive rounds. This exercises the full cycle twice: repair picks
the best-diversity candidate, creates a pending change, the pending
change suppresses further repair, enactment re-enables repair for the
next round, and the second addition completes the range.

Round 1 picks s2 over s3 (equal diversity, lower StoreID wins). Round 2
picks s3 as the only remaining candidate.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
…Voter repair

When an AddVoter repair is needed and there are existing non-voters that
could satisfy the voter constraint, promote one instead of adding a new
replica on a fresh store. The best promotion candidate is chosen by voter
diversity score (highest wins, ties broken by lower StoreID).

Extract `pickBestStoreByVoterDiversity` helper to avoid duplicating the
diversity-scoring loop between the add-new-voter and promote paths.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
pickBestStoreByVoterDiversity

When multiple candidate stores have equal voter diversity scores,
use reservoir sampling (via the existing rebalanceEnv RNG) to choose
uniformly at random instead of deterministically preferring the
lowest StoreID. This avoids systematically biasing placement toward
low-numbered stores in symmetric clusters.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Rename `pickBestStoreByVoterDiversity` to `pickStoreByDiversity` with a
`diversityScorer` function parameter. This allows the same picker to be
used with both `getScoreChangeForNewReplica` (for additions) and
`getScoreChangeForReplicaRemoval` (for removals). Update existing call
sites in `repairAddVoter` and `promoteNonVoterToVoter`.

Pure refactor, no behavior change.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Add `repairRemoveVoter()` which handles over-replicated ranges by
removing a voter. Candidate selection uses a health-based priority
ordering (dead > unknown > unhealthy > shedding > refusing > healthy),
taking the worst-health bucket first. Within that bucket, diversity-based
tiebreaking picks the most redundant voter (least diversity loss on
removal). The leaseholder is never considered for removal.

Wire `RemoveVoter` into the `repair()` dispatch loop. Update the
`repair_remove_voter.txt` test with the full lifecycle (repair, pending
suppression, confirm, healthy). Add a new `repair_remove_voter_healthy.txt`
test that verifies diversity-based selection when all stores are healthy.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
…irs from healthy

Previously, `computeRepairAction` returned `NoRepairNeeded` for ranges with
pending changes in flight, conflating "range is healthy" with "range needs
repair but a change is already in flight." Add a new `RepairPending` enum
value so these states are distinguishable, making it possible to observe
how many ranges have outstanding repair actions.

`RepairPending` ranges are excluded from the `repairRanges` index (same as
`NoRepairNeeded`) so they are not acted on during repair, but they are
surfaced in the `repair-needed` test command output via a separate scan.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
…on code

Move `candidatesToConvertFromVoterToNonVoter` and
`constraintsForAddingNonVoter` from `constraint_unused_test.go` to
`constraint.go`. These methods are needed by the upcoming `AddNonVoter`
repair action: the first finds voters that could be demoted to non-voter,
and the second returns the constraint disjunction for placing a new
non-voter.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
… tiers

Add a `replicasLocalityTiers` parameter to `pickStoreByDiversity` so that
non-voter operations can pass `replicaLocalityTiers` (all replicas)
instead of the previously hardcoded `voterLocalityTiers` (voters only).
This is needed because non-voter diversity should be scored against all
replicas, not just voters.

The three existing call sites (repairAddVoter, promoteNonVoterToVoter,
repairRemoveVoter) are updated to explicitly pass `voterLocalityTiers`,
preserving their existing behavior.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Add `repairRemoveNonVoter`, which removes an over-replicated non-voter.
Candidate selection follows the same priority as `repairRemoveVoter`
(dead > unknown > unhealthy > shedding > refusing > healthy), but does
not need to exclude the leaseholder since non-voters cannot hold leases.
Within the worst-health bucket, the non-voter whose removal hurts
diversity the least is chosen using `replicaLocalityTiers`.

The test exercises the full lifecycle: detect over-replication, remove
one non-voter, confirm via leaseholder message, verify healthy.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Add `repairAddNonVoter`, which adds a non-voter to an under-replicated
range. Like `repairAddVoter`, it first checks for a type-change
shortcut: if there are extra voters that could be demoted to non-voter
(via `candidatesToConvertFromVoterToNonVoter`), it uses
`demoteVoterToNonVoter` to change the type in place. Otherwise, it finds
a new store using the constraint disjunction from
`constraintsForAddingNonVoter`, filters candidates, and picks by replica
diversity.

The `demoteVoterToNonVoter` helper mirrors `promoteNonVoterToVoter` but
excludes the leaseholder and creates a VOTER_FULL -> NON_VOTER type
change.

The test exercises the full two-round lifecycle: add first non-voter,
confirm, add second, confirm, verify healthy.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Move 7 constraint analysis methods from `constraint_unused_test.go` to
`constraint.go`:

- candidatesToConvertFromNonVoterToVoter
- constraintsForAddingVoter
- candidatesToConvertFromVoterToNonVoter
- constraintsForAddingNonVoter
- candidatesForRoleSwapForConstraints
- candidatesVoterConstraintsUnsatisfied
- candidatesNonVoterConstraintsUnsatisfied

Pure mechanical move with improved doc comments from the prototype. These
methods are prerequisites for the per-action repair functions in later PRs
(AddVoter, RemoveVoter, constraint swaps).

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Add the RepairAction enum and computeRepairAction() decision tree. These
establish the action space and priority ordering for MMA repair.

RepairAction has 15 values (12 actionable + 3 terminal states), ordered by
priority via iota. computeRepairAction() maps range state to the
highest-priority repair action needed, using a straightforward if/else
cascade examining joint configs, quorum, replica counts, and constraint
satisfaction.

No callers yet — the wiring to clusterState comes in the next commit.

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Wire the repair action computation into clusterState so that each range's
repair action is eagerly tracked and indexed.

Structural changes:
- Add `repairAction RepairAction` field to `rangeState`
- Add `repairRanges map[RepairAction]map[RangeID]struct{}` to `clusterState`
- Add `updateRepairAction()` and `removeFromRepairRanges()` to maintain the
  index

Trigger points (where updateRepairAction is called):
1. End of processRangeMsg (replicas/config may have changed)
2. pendingChangeEnacted when all pending changes complete
3. End of undoPendingChange
4. End of addPendingRangeChange (sets RepairPending)
5. updateStoreStatuses when health/disposition changes (recomputes for
   all ranges on the affected store)

Range GC calls removeFromRepairRanges before deleting the range.

Test infrastructure:
- `repair-needed` DSL command: iterates repairRanges by priority, prints
  action-to-ranges mapping; scans separately for RepairPending
- `repair` DSL command: stub (pending changes only, no execution yet)
- Parser: nextReplicaID auto-assignment, quiet=true on set-store, relaxed
  field count for replica lines, repair recomputation on update-store-status

6 new testdata files exercise the repair tracking across priority ordering,
config changes, constraint changes, multi-range scenarios, pending change
lifecycle, and store status transitions.

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Add the RepairAction enum and computeRepairAction() decision tree. These
establish the action space and priority ordering for MMA repair.

RepairAction has 15 values (12 actionable + 3 terminal states), ordered by
priority via iota. computeRepairAction() maps range state to the
highest-priority repair action needed, using a straightforward if/else
cascade examining joint configs, quorum, replica counts, and constraint
satisfaction.

No callers yet — the wiring to clusterState comes in the next commit.

Comparison with legacy Allocator.ComputeAction (allocatorimpl/allocator.go):

The legacy allocator has two separate orderings that sometimes disagree:

1. The Priority() ordering (used to rank ranges in the replicate queue):
   FinalizeAtomicReplicationChange  12002
   RemoveLearner                    12001
   ReplaceDeadVoter                 12000
   AddVoter                         10000
   ReplaceDecommissioningVoter       5000
   RemoveDeadVoter                   1000
   RemoveDecommissioningVoter         900
   RemoveVoter                        800
   ReplaceDeadNonVoter                700
   AddNonVoter                        600
   ReplaceDecommissioningNonVoter     500
   RemoveDeadNonVoter                 400
   RemoveDecommissioningNonVoter      300
   RemoveNonVoter                     200

2. The computeAction() if/else cascade (used to pick which action to take
   for a single range):
   AddVoter                         ← checked before quorum!
   [quorum check → RangeUnavailable]
   ReplaceDeadVoter
   ReplaceDecommissioningVoter
   RemoveDeadVoter                  ← separate from ReplaceDeadVoter
   RemoveDecommissioningVoter       ← separate from ReplaceDecomVoter
   RemoveVoter
   AddNonVoter
   ReplaceDeadNonVoter
   ReplaceDecommissioningNonVoter
   RemoveDeadNonVoter               ← separate from ReplaceDeadNonVoter
   RemoveDecommissioningNonVoter    ← separate from ReplaceDecomNonVoter
   RemoveNonVoter

MMA's RepairAction unifies both orderings into a single iota sequence:
   FinalizeAtomicReplicationChange   (1)
   RemoveLearner                     (2)
   AddVoter                          (3)
   ReplaceDeadVoter                  (4)
   ReplaceDecommissioningVoter       (5)
   RemoveVoter                       (6)
   AddNonVoter                       (7)
   ReplaceDeadNonVoter               (8)
   ReplaceDecommissioningNonVoter    (9)
   RemoveNonVoter                   (10)
   SwapVoterForConstraints          (11)  ← new, legacy has no equivalent
   SwapNonVoterForConstraints       (12)  ← new, legacy has no equivalent
   RepairSkipped                    (13)
   RepairPending                    (14)
   NoRepairNeeded                   (15)

Key differences from legacy:

- Quorum check gates all actions: In the legacy code, AddVoter is checked
  before the quorum gate, meaning it can be attempted even without quorum
  (with a TODO noting this). MMA checks quorum first (step 4) and skips
  repair entirely if quorum is lost, since all replication changes require
  raft consensus.

- No separate Remove{Dead,Decommissioning}{Voter,NonVoter}: The legacy
  code distinguishes "replace dead voter" (count matches, add-then-remove)
  from "remove dead voter" (over-replicated, just remove). MMA collapses
  these — RemoveVoter handles all over-replication cases, with candidate
  selection preferring dead > decommissioning > healthy replicas.

- Constraint swaps are new: Legacy doesn't have repair actions for
  constraint violations — those are handled as rebalancing. MMA treats
  them as repair because a range with correct counts but wrong placement
  is not fully conformant.

Informs cockroachdb#164658.
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Wire the repair action computation into clusterState so that each range's
repair action is eagerly tracked and indexed.

Structural changes:
- Add `repairAction RepairAction` field to `rangeState`
- Add `repairRanges map[RepairAction]map[RangeID]struct{}` to `clusterState`
- Add `updateRepairAction()` and `removeFromRepairRanges()` to maintain the
  index

Trigger points (where updateRepairAction is called):
1. End of processRangeMsg (replicas/config may have changed)
2. pendingChangeEnacted when all pending changes complete
3. End of undoPendingChange
4. End of addPendingRangeChange (sets RepairPending)
5. updateStoreStatuses when health/disposition changes (recomputes for
   all ranges on the affected store)

Range GC calls removeFromRepairRanges before deleting the range.

Test infrastructure:
- `repair-needed` DSL command: iterates repairRanges by priority, prints
  action-to-ranges mapping; scans separately for RepairPending
- `repair` DSL command: stub (pending changes only, no execution yet)
- Parser: nextReplicaID auto-assignment, quiet=true on set-store, relaxed
  field count for replica lines, repair recomputation on update-store-status

6 new testdata files exercise the repair tracking across priority ordering,
config changes, constraint changes, multi-range scenarios, pending change
lifecycle, and store status transitions.

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Add the RepairAction enum and computeRepairAction() decision tree. These
establish the action space and priority ordering for MMA repair.

RepairAction has 15 values (12 actionable + 3 terminal states), ordered by
priority via iota. computeRepairAction() maps range state to the
highest-priority repair action needed, using a straightforward if/else
cascade examining joint configs, quorum, replica counts, and constraint
satisfaction.

No callers yet — the wiring to clusterState comes in the next commit.

Comparison with legacy Allocator.ComputeAction (allocatorimpl/allocator.go):

The legacy allocator has two separate orderings that sometimes disagree:

1. The Priority() ordering (used to rank ranges in the replicate queue):
   FinalizeAtomicReplicationChange  12002
   RemoveLearner                    12001
   ReplaceDeadVoter                 12000
   AddVoter                         10000
   ReplaceDecommissioningVoter       5000
   RemoveDeadVoter                   1000
   RemoveDecommissioningVoter         900
   RemoveVoter                        800
   ReplaceDeadNonVoter                700
   AddNonVoter                        600
   ReplaceDecommissioningNonVoter     500
   RemoveDeadNonVoter                 400
   RemoveDecommissioningNonVoter      300
   RemoveNonVoter                     200

2. The computeAction() if/else cascade (used to pick which action to take
   for a single range):
   AddVoter                         ← checked before quorum!
   [quorum check → RangeUnavailable]
   ReplaceDeadVoter
   ReplaceDecommissioningVoter
   RemoveDeadVoter                  ← separate from ReplaceDeadVoter
   RemoveDecommissioningVoter       ← separate from ReplaceDecomVoter
   RemoveVoter
   AddNonVoter
   ReplaceDeadNonVoter
   ReplaceDecommissioningNonVoter
   RemoveDeadNonVoter               ← separate from ReplaceDeadNonVoter
   RemoveDecommissioningNonVoter    ← separate from ReplaceDecomNonVoter
   RemoveNonVoter

MMA's RepairAction unifies both orderings into a single iota sequence:
   FinalizeAtomicReplicationChange   (1)
   RemoveLearner                     (2)
   AddVoter                          (3)
   ReplaceDeadVoter                  (4)
   ReplaceDecommissioningVoter       (5)
   RemoveVoter                       (6)
   AddNonVoter                       (7)
   ReplaceDeadNonVoter               (8)
   ReplaceDecommissioningNonVoter    (9)
   RemoveNonVoter                   (10)
   SwapVoterForConstraints          (11)  ← new, legacy has no equivalent
   SwapNonVoterForConstraints       (12)  ← new, legacy has no equivalent
   RepairSkipped                    (13)
   RepairPending                    (14)
   NoRepairNeeded                   (15)

Key differences from legacy:

- Quorum check gates all actions: In the legacy code, AddVoter is checked
  before the quorum gate, meaning it can be attempted even without quorum
  (with a TODO noting this). MMA checks quorum first (step 4) and skips
  repair entirely if quorum is lost, since all replication changes require
  raft consensus.

- No separate Remove{Dead,Decommissioning}{Voter,NonVoter}: The legacy
  code distinguishes "replace dead voter" (count matches, add-then-remove)
  from "remove dead voter" (over-replicated, just remove). MMA collapses
  these — RemoveVoter handles all over-replication cases, with candidate
  selection preferring dead > decommissioning > healthy replicas.

- Constraint swaps are new: Legacy doesn't have repair actions for
  constraint violations — those are handled as rebalancing. MMA treats
  them as repair because a range with correct counts but wrong placement
  is not fully conformant.

Informs cockroachdb#164658.
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Wire the repair action computation into clusterState so that each range's
repair action is eagerly tracked and indexed.

Structural changes:
- Add `repairAction RepairAction` field to `rangeState`
- Add `repairRanges map[RepairAction]map[RangeID]struct{}` to `clusterState`
- Add `updateRepairAction()` and `removeFromRepairRanges()` to maintain the
  index

Trigger points (where updateRepairAction is called):
1. End of processRangeMsg (replicas/config may have changed)
2. pendingChangeEnacted when all pending changes complete
3. End of undoPendingChange
4. End of addPendingRangeChange (sets RepairPending)
5. updateStoreStatuses when health/disposition changes (recomputes for
   all ranges on the affected store)

Range GC calls removeFromRepairRanges before deleting the range.

Test infrastructure:
- `repair-needed` DSL command: iterates repairRanges by priority, prints
  action-to-ranges mapping; scans separately for RepairPending
- `repair` DSL command: stub (pending changes only, no execution yet)
- Parser: nextReplicaID auto-assignment, quiet=true on set-store, relaxed
  field count for replica lines, repair recomputation on update-store-status

6 new testdata files exercise the repair tracking across priority ordering,
config changes, constraint changes, multi-range scenarios, pending change
lifecycle, and store status transitions.

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Add the repair() method on rebalanceEnv — the main entry point for MMA
repair. It iterates repairRanges in priority order, filters to ranges
where the local store is the leaseholder, and dispatches to per-action
repair functions. No repair actions are implemented yet (the switch
default logs "not yet implemented"); AddVoter comes in the next commit.

Wire repair into ComputeChanges via the IncludeRepair field on
ChangeOptions. When set, repair() runs before rebalanceStores(), and its
pending changes prevent the rebalancer from touching the same ranges.

Add originMMARepair to the ChangeOrigin enum so that repair-originated
changes can be tracked through AdjustPendingChangeDisposition. For now
repair changes share the rebalance metric counters; dedicated repair
metrics come in a follow-up PR.

Add the "repair" DSL command to the test harness. It creates a
rebalanceEnv with a deterministic random seed and calls repair().

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Add the repair() method on rebalanceEnv — the main entry point for MMA
repair. It iterates repairRanges in priority order, filters to ranges
where the local store is the leaseholder, and dispatches to per-action
repair functions. No repair actions are implemented yet (the switch
default logs "not yet implemented"); AddVoter comes in the next commit.

Wire repair into ComputeChanges via the IncludeRepair field on
ChangeOptions. When set, repair() runs before rebalanceStores(), and its
pending changes prevent the rebalancer from touching the same ranges.

Add originMMARepair to the ChangeOrigin enum so that repair-originated
changes can be tracked through AdjustPendingChangeDisposition. For now
repair changes share the rebalance metric counters; dedicated repair
metrics come in a follow-up PR.

Add the "repair" DSL command to the test harness. It creates a
rebalanceEnv with a deterministic random seed and calls repair().

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Move 7 constraint analysis methods from `constraint_unused_test.go` to
`constraint.go`:

- candidatesToConvertFromNonVoterToVoter
- constraintsForAddingVoter
- candidatesToConvertFromVoterToNonVoter
- constraintsForAddingNonVoter
- candidatesForRoleSwapForConstraints
- candidatesVoterConstraintsUnsatisfied
- candidatesNonVoterConstraintsUnsatisfied

Pure mechanical move with improved doc comments from the prototype. These
methods are prerequisites for the per-action repair functions in later PRs
(AddVoter, RemoveVoter, constraint swaps).

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Add the RepairAction enum and computeRepairAction() decision tree. These
establish the action space and priority ordering for MMA repair.

RepairAction has 15 values (12 actionable + 3 terminal states), ordered by
priority via iota. computeRepairAction() maps range state to the
highest-priority repair action needed, using a straightforward if/else
cascade examining joint configs, quorum, replica counts, and constraint
satisfaction.

No callers yet — the wiring to clusterState comes in the next commit.

Comparison with legacy Allocator.ComputeAction (allocatorimpl/allocator.go):

The legacy allocator has two separate orderings that sometimes disagree:

1. The Priority() ordering (used to rank ranges in the replicate queue):
   FinalizeAtomicReplicationChange  12002
   RemoveLearner                    12001
   ReplaceDeadVoter                 12000
   AddVoter                         10000
   ReplaceDecommissioningVoter       5000
   RemoveDeadVoter                   1000
   RemoveDecommissioningVoter         900
   RemoveVoter                        800
   ReplaceDeadNonVoter                700
   AddNonVoter                        600
   ReplaceDecommissioningNonVoter     500
   RemoveDeadNonVoter                 400
   RemoveDecommissioningNonVoter      300
   RemoveNonVoter                     200

2. The computeAction() if/else cascade (used to pick which action to take
   for a single range):
   AddVoter                         ← checked before quorum!
   [quorum check → RangeUnavailable]
   ReplaceDeadVoter
   ReplaceDecommissioningVoter
   RemoveDeadVoter                  ← separate from ReplaceDeadVoter
   RemoveDecommissioningVoter       ← separate from ReplaceDecomVoter
   RemoveVoter
   AddNonVoter
   ReplaceDeadNonVoter
   ReplaceDecommissioningNonVoter
   RemoveDeadNonVoter               ← separate from ReplaceDeadNonVoter
   RemoveDecommissioningNonVoter    ← separate from ReplaceDecomNonVoter
   RemoveNonVoter

MMA's RepairAction unifies both orderings into a single iota sequence:
   FinalizeAtomicReplicationChange   (1)
   RemoveLearner                     (2)
   AddVoter                          (3)
   ReplaceDeadVoter                  (4)
   ReplaceDecommissioningVoter       (5)
   RemoveVoter                       (6)
   AddNonVoter                       (7)
   ReplaceDeadNonVoter               (8)
   ReplaceDecommissioningNonVoter    (9)
   RemoveNonVoter                   (10)
   SwapVoterForConstraints          (11)  ← new, legacy has no equivalent
   SwapNonVoterForConstraints       (12)  ← new, legacy has no equivalent
   RepairSkipped                    (13)
   RepairPending                    (14)
   NoRepairNeeded                   (15)

Key differences from legacy:

- Quorum check gates all actions: In the legacy code, AddVoter is checked
  before the quorum gate, meaning it can be attempted even without quorum
  (with a TODO noting this). MMA checks quorum first (step 4) and skips
  repair entirely if quorum is lost, since all replication changes require
  raft consensus.

- No separate Remove{Dead,Decommissioning}{Voter,NonVoter}: The legacy
  code distinguishes "replace dead voter" (count matches, add-then-remove)
  from "remove dead voter" (over-replicated, just remove). MMA collapses
  these — RemoveVoter handles all over-replication cases, with candidate
  selection preferring dead > decommissioning > healthy replicas.

- Constraint swaps are new: Legacy doesn't have repair actions for
  constraint violations — those are handled as rebalancing. MMA treats
  them as repair because a range with correct counts but wrong placement
  is not fully conformant.

Informs cockroachdb#164658.
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Wire the repair action computation into clusterState so that each range's
repair action is eagerly tracked and indexed.

Structural changes:
- Add `repairAction RepairAction` field to `rangeState`
- Add `repairRanges map[RepairAction]map[RangeID]struct{}` to `clusterState`
- Add `updateRepairAction()` and `removeFromRepairRanges()` to maintain the
  index

Trigger points (where updateRepairAction is called):
1. End of processRangeMsg (replicas/config may have changed)
2. pendingChangeEnacted when all pending changes complete
3. End of undoPendingChange
4. End of addPendingRangeChange (sets RepairPending)
5. updateStoreStatuses when health/disposition changes (recomputes for
   all ranges on the affected store)

Range GC calls removeFromRepairRanges before deleting the range.

Test infrastructure:
- `repair-needed` DSL command: iterates repairRanges by priority, prints
  action-to-ranges mapping; scans separately for RepairPending
- `repair` DSL command: stub (pending changes only, no execution yet)
- Parser: nextReplicaID auto-assignment, quiet=true on set-store, relaxed
  field count for replica lines, repair recomputation on update-store-status

6 new testdata files exercise the repair tracking across priority ordering,
config changes, constraint changes, multi-range scenarios, pending change
lifecycle, and store status transitions.

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Add the repair() method on rebalanceEnv — the main entry point for MMA
repair. It iterates repairRanges in priority order, filters to ranges
where the local store is the leaseholder, and dispatches to per-action
repair functions. No repair actions are implemented yet (the switch
default logs "not yet implemented"); AddVoter comes in the next commit.

Wire repair into ComputeChanges via the IncludeRepair field on
ChangeOptions. When set, repair() runs before rebalanceStores(), and its
pending changes prevent the rebalancer from touching the same ranges.

Add originMMARepair to the ChangeOrigin enum so that repair-originated
changes can be tracked through AdjustPendingChangeDisposition. For now
repair changes share the rebalance metric counters; dedicated repair
metrics come in a follow-up PR.

Add the "repair" DSL command to the test harness. It creates a
rebalanceEnv with a deterministic random seed and calls repair().

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Move 7 constraint analysis methods from `constraint_unused_test.go` to
`constraint.go`:

- candidatesToConvertFromNonVoterToVoter
- constraintsForAddingVoter
- candidatesToConvertFromVoterToNonVoter
- constraintsForAddingNonVoter
- candidatesForRoleSwapForConstraints
- candidatesVoterConstraintsUnsatisfied
- candidatesNonVoterConstraintsUnsatisfied

Pure mechanical move with improved doc comments from the prototype. These
methods are prerequisites for the per-action repair functions in later PRs
(AddVoter, RemoveVoter, constraint swaps).

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Add the RepairAction enum and computeRepairAction() decision tree. These
establish the action space and priority ordering for MMA repair.

RepairAction has 15 values (12 actionable + 3 terminal states), ordered by
priority via iota. computeRepairAction() maps range state to the
highest-priority repair action needed, using a straightforward if/else
cascade examining joint configs, quorum, replica counts, and constraint
satisfaction.

No callers yet — the wiring to clusterState comes in the next commit.

Comparison with legacy Allocator.ComputeAction (allocatorimpl/allocator.go):

The legacy allocator has two separate orderings that sometimes disagree:

1. The Priority() ordering (used to rank ranges in the replicate queue):
   FinalizeAtomicReplicationChange  12002
   RemoveLearner                    12001
   ReplaceDeadVoter                 12000
   AddVoter                         10000
   ReplaceDecommissioningVoter       5000
   RemoveDeadVoter                   1000
   RemoveDecommissioningVoter         900
   RemoveVoter                        800
   ReplaceDeadNonVoter                700
   AddNonVoter                        600
   ReplaceDecommissioningNonVoter     500
   RemoveDeadNonVoter                 400
   RemoveDecommissioningNonVoter      300
   RemoveNonVoter                     200

2. The computeAction() if/else cascade (used to pick which action to take
   for a single range):
   AddVoter                         ← checked before quorum!
   [quorum check → RangeUnavailable]
   ReplaceDeadVoter
   ReplaceDecommissioningVoter
   RemoveDeadVoter                  ← separate from ReplaceDeadVoter
   RemoveDecommissioningVoter       ← separate from ReplaceDecomVoter
   RemoveVoter
   AddNonVoter
   ReplaceDeadNonVoter
   ReplaceDecommissioningNonVoter
   RemoveDeadNonVoter               ← separate from ReplaceDeadNonVoter
   RemoveDecommissioningNonVoter    ← separate from ReplaceDecomNonVoter
   RemoveNonVoter

MMA's RepairAction unifies both orderings into a single iota sequence:
   FinalizeAtomicReplicationChange   (1)
   RemoveLearner                     (2)
   AddVoter                          (3)
   ReplaceDeadVoter                  (4)
   ReplaceDecommissioningVoter       (5)
   RemoveVoter                       (6)
   AddNonVoter                       (7)
   ReplaceDeadNonVoter               (8)
   ReplaceDecommissioningNonVoter    (9)
   RemoveNonVoter                   (10)
   SwapVoterForConstraints          (11)  ← new, legacy has no equivalent
   SwapNonVoterForConstraints       (12)  ← new, legacy has no equivalent
   RepairSkipped                    (13)
   RepairPending                    (14)
   NoRepairNeeded                   (15)

Key differences from legacy:

- Quorum check gates all actions: In the legacy code, AddVoter is checked
  before the quorum gate, meaning it can be attempted even without quorum
  (with a TODO noting this). MMA checks quorum first (step 4) and skips
  repair entirely if quorum is lost, since all replication changes require
  raft consensus.

- No separate Remove{Dead,Decommissioning}{Voter,NonVoter}: The legacy
  code distinguishes "replace dead voter" (count matches, add-then-remove)
  from "remove dead voter" (over-replicated, just remove). MMA collapses
  these — RemoveVoter handles all over-replication cases, with candidate
  selection preferring dead > decommissioning > healthy replicas.

- Constraint swaps are new: Legacy doesn't have repair actions for
  constraint violations — those are handled as rebalancing. MMA treats
  them as repair because a range with correct counts but wrong placement
  is not fully conformant.

Informs cockroachdb#164658.
tbg added a commit to tbg/cockroach that referenced this pull request Mar 11, 2026
Wire the repair action computation into clusterState so that each range's
repair action is eagerly tracked and indexed.

Structural changes:
- Add `repairAction RepairAction` field to `rangeState`
- Add `repairRanges map[RepairAction]map[RangeID]struct{}` to `clusterState`
- Add `updateRepairAction()` and `removeFromRepairRanges()` to maintain the
  index

Trigger points (where updateRepairAction is called):
1. End of processRangeMsg (replicas/config may have changed)
2. pendingChangeEnacted when all pending changes complete
3. End of undoPendingChange
4. End of addPendingRangeChange (sets RepairPending)
5. updateStoreStatuses when health/disposition changes (recomputes for
   all ranges on the affected store)

Range GC calls removeFromRepairRanges before deleting the range.

Test infrastructure:
- `repair-needed` DSL command: iterates repairRanges by priority, prints
  action-to-ranges mapping; scans separately for RepairPending
- `repair` DSL command: stub (pending changes only, no execution yet)
- Parser: nextReplicaID auto-assignment, quiet=true on set-store, relaxed
  field count for replica lines, repair recomputation on update-store-status

6 new testdata files exercise the repair tracking across priority ordering,
config changes, constraint changes, multi-range scenarios, pending change
lifecycle, and store status transitions.

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 12, 2026
Add the repair() method on rebalanceEnv — the main entry point for MMA
repair. It iterates repairRanges in priority order, filters to ranges
where the local store is the leaseholder, and dispatches to per-action
repair functions. No repair actions are implemented yet (the switch
default logs "not yet implemented"); AddVoter comes in the next commit.

Wire repair into ComputeChanges via the IncludeRepair field on
ChangeOptions. When set, repair() runs before rebalanceStores(), and its
pending changes prevent the rebalancer from touching the same ranges.

Add originMMARepair to the ChangeOrigin enum so that repair-originated
changes can be tracked through AdjustPendingChangeDisposition. For now
repair changes share the rebalance metric counters; dedicated repair
metrics come in a follow-up PR.

Add the "repair" DSL command to the test harness. It creates a
rebalanceEnv with a deterministic random seed and calls repair().

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Mark PRs 1 and 2 as completed with links to cockroachdb#165413 and cockroachdb#165423.
Update PR 3 description to include ASIM wiring and reflect prototype
discoveries. Fix PR 4/5 helper lists to account for what already shipped.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 12, 2026
Add the new LBRebalancingMultiMetricRepairAndRebalance enum constant to
LBRebalancingMode. In this mode, MMA handles both rebalancing and repair;
the replicate and lease queues are completely disabled.

The mode is deliberately NOT added to the LoadBasedRebalancingMode settings
registration map, so it cannot be set via SET CLUSTER SETTING — only via
Override() in tests. (EnumSetting.Override explicitly bypasses validation.)

Also expand LoadBasedRebalancingModeIsMMA to include the new mode and add
the LoadBasedRebalancingModeIsMMARepairAndRebalance helper function.

No behavior change: nothing checks for this value yet.

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 12, 2026
When LBRebalancingMultiMetricRepairAndRebalance mode is active, short-circuit
both shouldQueue and process on the replicate and lease queues. This prevents
both enqueuing and processing, making MMA solely responsible for all replica
placement decisions.

Also update CountBasedRebalancingDisabled to return true in the new mode,
since count-based rebalancing should also be disabled when MMA handles repair.

All changes are gated by the new mode — no behavior change under existing modes.

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 12, 2026
Add RepairReplicaChange{Success,Failure} and RepairLeaseChange{Success,Failure}
counters, replacing the temporary routing of repair metrics through the
rebalance counters from the previous PR.

Split the combined `originMMARebalance, originMMARepair` case in
AdjustPendingChangeDisposition into separate cases, each incrementing its own
metric counters.

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 12, 2026
Wire the MMA repair-and-rebalance mode into the allocator simulator:

- Add early return guards in replicate queue and lease queue Tick() methods
  when the respective queue is disabled via SimulationSettings.
- Expand SetClusterSetting to disable both queues when the new mode is set.
- Wire IncludeRepair in the ASIM MMA store rebalancer's ComputeChanges call,
  gated on the repair-and-rebalance mode.
- Add "mma-repair" to knownConfigurations for datadriven tests.
- Add repair_add_voter and repair_promote_nonvoter testdata files that verify
  MMA repair upreplicates under-replicated ranges and promotes non-voters
  to voters, respectively.

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 12, 2026
Pure refactor: extract the tight rebalance loop from run() into a
rebalanceUntilStable method for reuse by ForceReplicationScanAndProcess
in the next commit.

No behavior change.

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 12, 2026
Complete the production mma_store_rebalancer wiring for repair:

1. Add IsLeaveJoint() predicate on ExternalRangeChange that detects
   joint-config finalization changes. These cannot be expressed as
   kvpb.ReplicationChanges because the production code uses
   maybeLeaveAtomicChangeReplicas directly.

2. Wire IncludeRepair in rebalance(), gated on the
   LBRebalancingMultiMetricRepairAndRebalance mode check.

3. Add IsLeaveJoint routing in applyChange() between IsPureTransferLease
   and IsChangeReplicas, delegating to maybeLeaveAtomicChangeReplicas.

4. Expand the replicaToApplyChanges interface with
   maybeLeaveAtomicChangeReplicas.

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 12, 2026
…stMMAUpreplication

When MMA repair-and-rebalance mode is active,
ForceReplicationScanAndProcess delegates to rebalanceUntilStable()
instead of the replicate queue (whose shouldQueue/process are no-ops
in that mode). This enables WaitForFullReplication and deterministic
test driving.

Add TestMMAUpreplication, which starts a 3-node cluster with
ReplicationAuto, creates a scratch range (1 replica), enables MMA
repair-and-rebalance mode, and verifies it upreplicates to 3 voters
entirely through MMA repair.

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@tbg tbg closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants