Skip to content

allocator/mmaprototype: add repair action foundation#165413

Closed
tbg wants to merge 3 commits intocockroachdb:masterfrom
tbg:production-mma-repair
Closed

allocator/mmaprototype: add repair action foundation#165413
tbg wants to merge 3 commits intocockroachdb:masterfrom
tbg:production-mma-repair

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Mar 11, 2026

First PR in the MMA repair productionization stack (see #164658 for the
prototype and how-to-productionize.md for the full plan).

This PR adds the foundation for MMA repair:

  • RepairAction enum: 12 actionable repair actions plus 3 terminal states,
    priority-ordered. The ordering determines both which problem to fix first for
    a single range and which ranges get repaired first across the cluster.

  • computeRepairAction() decision tree: maps range state (replica counts,
    store health, constraint satisfaction) to the highest-priority repair action
    needed. This is a straightforward if/else cascade — no numerical scoring.

  • Repair index on clusterState: eagerly-computed repairAction per range,
    with a repairRanges map indexing ranges by action for efficient iteration.
    Recomputed at every trigger point (range messages, pending changes, store
    status changes).

  • Constraint helper methods: moved 7 methods from test-only code to
    production code. These are prerequisites for the per-action repair functions
    in later PRs.

  • DSL test infrastructure: repair-needed command to inspect the repair
    index, plus parser enhancements for compact repair test scenarios.

No repair actions are executed yet — that comes in PR 2 (AddVoter +
orchestration). This PR establishes the action space and priority ordering
for early review feedback.

Informs #164658.

@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io bot commented Mar 11, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Mar 11, 2026

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbg reviewed 13 files and all commit messages, and made 9 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained.


-- commits line 27 at r2:
Can you find where the corresponding sequence for the legacy allocator lives (see Allocator.ComputeAction) and, in the commit message, draw a comparison between both of them and their priority ordering with the goal of explaining to the reviewer what changes here, if anything, when transitioning from legacy to mma. Iterate on it with me before pushing the result.


pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 1315 at r3 (raw file):

	// repairRanges indexes ranges by their RepairAction. Ranges with
	// NoRepairNeeded or RepairPending are NOT stored. This allows repair()
	// to iterate ranges needing repair without scanning all ranges.

mention that this is always consistent with the rangeState.repairAction fields of all the tracked ranges.

Is there a convenient place to assert this (maybe we have other integrity checks already)?


pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 1966 at r3 (raw file):

	delete(cs.pendingChanges, change.changeID)
	// Recompute repair action if all pending changes for this range have
	// been enacted.

Add that this is correct (we don't have to re-compute if partially enacted) because as long as there are ANY pending changes, the repair action is RepairPending anyway.
(.. and check that I'm not saying something wrong here..)

Or should we just call this unconditionally? Performance is not a concern in this path. I'm asking because undoPendingChange unconditionally calls updateRepairAction so it would be more consistent to do this here, too.


pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 2344 at r3 (raw file):

		// If the store's status changed, recompute repair actions for all
		// ranges on this store. The cached rs.constraints is still valid here
		// (store status doesn't affect constraint satisfaction).

Log at verbosity one about the status change and the need to do a full recomputation.


pkg/kv/kvserver/allocator/mmaprototype/cluster_state_repair.go line 1 at r2 (raw file):

// Copyright 2025 The Cockroach Authors.

2026


pkg/kv/kvserver/allocator/mmaprototype/cluster_state_repair.go line 85 at r2 (raw file):

)

var repairActionNames = [...]string{

Can you make this use go:generate stringer like done in other places? There might be some BAZEL stuff to get this right but you can copy from other types that have this already.


pkg/kv/kvserver/allocator/mmaprototype/cluster_state_repair.go line 197 at r2 (raw file):

	aliveVoters := numVoters - deadVoters
	quorum := numVoters/2 + 1
	if aliveVoters < quorum {

this should be swapped with step 4. If we don't have quorum, we also shouldn't try to finalize replication changes or to remove learners because that too requires quorum.


pkg/kv/kvserver/allocator/mmaprototype/cluster_state_repair.go line 269 at r3 (raw file):

	// Remove from old bucket.
	if oldAction != NoRepairNeeded && oldAction != RepairPending && oldAction != 0 {
		if m, ok := cs.repairRanges[oldAction]; ok {

It would be an invariant violation to hit !ok, right? Add an assertion (only when buildutil.CrdbTestBuild).


pkg/kv/kvserver/allocator/mmaprototype/cluster_state_test.go line 731 at r3 (raw file):

					return fmt.Sprintf("%s%v\n", safeTrace(t, &sb), out)

				case "repair":

I don't think this is used anywhere, so back this out for now. It can come back when it's actually wired up to do something useful and exercised.

@tbg tbg force-pushed the production-mma-repair branch from 15f2aa2 to e58ee9f Compare March 11, 2026 12:17
Copy link
Copy Markdown
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbg reviewed 12 files and all commit messages, and resolved 5 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained.

tbg and others added 3 commits March 11, 2026 16:37
Move 7 constraint analysis methods from `constraint_unused_test.go` to
`constraint.go`:

- candidatesToConvertFromNonVoterToVoter
- constraintsForAddingVoter
- candidatesToConvertFromVoterToNonVoter
- constraintsForAddingNonVoter
- candidatesForRoleSwapForConstraints
- candidatesVoterConstraintsUnsatisfied
- candidatesNonVoterConstraintsUnsatisfied

Pure mechanical move with improved doc comments from the prototype. These
methods are prerequisites for the per-action repair functions in later PRs
(AddVoter, RemoveVoter, constraint swaps).

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Add the RepairAction enum and computeRepairAction() decision tree. These
establish the action space and priority ordering for MMA repair.

RepairAction has 15 values (12 actionable + 3 terminal states), ordered by
priority via iota. computeRepairAction() maps range state to the
highest-priority repair action needed, using a straightforward if/else
cascade examining joint configs, quorum, replica counts, and constraint
satisfaction.

No callers yet — the wiring to clusterState comes in the next commit.

Comparison with legacy Allocator.ComputeAction (allocatorimpl/allocator.go):

The legacy allocator has two separate orderings that sometimes disagree:

1. The Priority() ordering (used to rank ranges in the replicate queue):
   FinalizeAtomicReplicationChange  12002
   RemoveLearner                    12001
   ReplaceDeadVoter                 12000
   AddVoter                         10000
   ReplaceDecommissioningVoter       5000
   RemoveDeadVoter                   1000
   RemoveDecommissioningVoter         900
   RemoveVoter                        800
   ReplaceDeadNonVoter                700
   AddNonVoter                        600
   ReplaceDecommissioningNonVoter     500
   RemoveDeadNonVoter                 400
   RemoveDecommissioningNonVoter      300
   RemoveNonVoter                     200

2. The computeAction() if/else cascade (used to pick which action to take
   for a single range):
   AddVoter                         ← checked before quorum!
   [quorum check → RangeUnavailable]
   ReplaceDeadVoter
   ReplaceDecommissioningVoter
   RemoveDeadVoter                  ← separate from ReplaceDeadVoter
   RemoveDecommissioningVoter       ← separate from ReplaceDecomVoter
   RemoveVoter
   AddNonVoter
   ReplaceDeadNonVoter
   ReplaceDecommissioningNonVoter
   RemoveDeadNonVoter               ← separate from ReplaceDeadNonVoter
   RemoveDecommissioningNonVoter    ← separate from ReplaceDecomNonVoter
   RemoveNonVoter

MMA's RepairAction unifies both orderings into a single iota sequence:
   FinalizeAtomicReplicationChange   (1)
   RemoveLearner                     (2)
   AddVoter                          (3)
   ReplaceDeadVoter                  (4)
   ReplaceDecommissioningVoter       (5)
   RemoveVoter                       (6)
   AddNonVoter                       (7)
   ReplaceDeadNonVoter               (8)
   ReplaceDecommissioningNonVoter    (9)
   RemoveNonVoter                   (10)
   SwapVoterForConstraints          (11)  ← new, legacy has no equivalent
   SwapNonVoterForConstraints       (12)  ← new, legacy has no equivalent
   RepairSkipped                    (13)
   RepairPending                    (14)
   NoRepairNeeded                   (15)

Key differences from legacy:

- Quorum check gates all actions: In the legacy code, AddVoter is checked
  before the quorum gate, meaning it can be attempted even without quorum
  (with a TODO noting this). MMA checks quorum first (step 4) and skips
  repair entirely if quorum is lost, since all replication changes require
  raft consensus.

- No separate Remove{Dead,Decommissioning}{Voter,NonVoter}: The legacy
  code distinguishes "replace dead voter" (count matches, add-then-remove)
  from "remove dead voter" (over-replicated, just remove). MMA collapses
  these — RemoveVoter handles all over-replication cases, with candidate
  selection preferring dead > decommissioning > healthy replicas.

- Constraint swaps are new: Legacy doesn't have repair actions for
  constraint violations — those are handled as rebalancing. MMA treats
  them as repair because a range with correct counts but wrong placement
  is not fully conformant.

Informs cockroachdb#164658.
Wire the repair action computation into clusterState so that each range's
repair action is eagerly tracked and indexed.

Structural changes:
- Add `repairAction RepairAction` field to `rangeState`
- Add `repairRanges map[RepairAction]map[RangeID]struct{}` to `clusterState`
- Add `updateRepairAction()` and `removeFromRepairRanges()` to maintain the
  index

Trigger points (where updateRepairAction is called):
1. End of processRangeMsg (replicas/config may have changed)
2. pendingChangeEnacted when all pending changes complete
3. End of undoPendingChange
4. End of addPendingRangeChange (sets RepairPending)
5. updateStoreStatuses when health/disposition changes (recomputes for
   all ranges on the affected store)

Range GC calls removeFromRepairRanges before deleting the range.

Test infrastructure:
- `repair-needed` DSL command: iterates repairRanges by priority, prints
  action-to-ranges mapping; scans separately for RepairPending
- `repair` DSL command: stub (pending changes only, no execution yet)
- Parser: nextReplicaID auto-assignment, quiet=true on set-store, relaxed
  field count for replica lines, repair recomputation on update-store-status

6 new testdata files exercise the repair tracking across priority ordering,
config changes, constraint changes, multi-range scenarios, pending change
lifecycle, and store status transitions.

Informs cockroachdb#164658.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@tbg tbg force-pushed the production-mma-repair branch from 6dea171 to 9cf44a5 Compare March 11, 2026 15:37
@cockroach-teamcity cockroach-teamcity added the X-perf-gain Microbenchmarks CI: Added if a performance gain is detected label Mar 11, 2026
Copy link
Copy Markdown
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbg reviewed 13 files and all commit messages, and resolved 4 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on tbg).

@tbg tbg removed the X-perf-gain Microbenchmarks CI: Added if a performance gain is detected label Mar 12, 2026
tbg added a commit to tbg/cockroach that referenced this pull request Mar 12, 2026
Mark PRs 1 and 2 as completed with links to cockroachdb#165413 and cockroachdb#165423.
Update PR 3 description to include ASIM wiring and reflect prototype
discoveries. Fix PR 4/5 helper lists to account for what already shipped.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@tbg tbg closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants