fix(build/orchestrator): close PREMATURE_COMPLETION race + harden --mark-phase-committed dirty-tree recovery#48
Merged
Merged
Conversation
…romOriginIssues rewinds a committed phase Root cause of the 2026-05-18 mitosis PREMATURE_COMPLETION fault class. When origin verification fails after a feature reaches phases_done, restartFeatureFromOriginIssues (cli.ts:3385) rewinds the last committed phase back to `tests_green` and re-points currentPhaseIndex at it so the review/QA loop re-runs. The status rewind is intentional and load-bearing — the re-run is how origin-plan gaps get patched. But the plan markdown checkboxes (flipped by markCommitted via flipPhaseCheckboxes at cli.ts:5835) were NOT un-flipped. If the re-run then fails — any of the 22 `phaseState.status = "failed"` sites in cli.ts fires — the phase ends up at status=failed with checkboxes still [x][x][x]. That's the exact PREMATURE_COMPLETION invariant the fault detector fires on (skill-fault-detector.ts:429). The fault reports from 2026-05-18 (agnt2-prototype, mitosis-control-plane, mitosis-prototype-v3.1) claimed the mechanism was per-sub-step incremental checkbox flipping. That hypothesis was wrong — there are only 5 flip sites in cli.ts and all are gated. The actual mechanism is the rewind-without- un-flip race in the origin-verification restart path. Fix: - New unflipPhaseCheckboxes(planFile, phase) in plan-mutator.ts: the symmetric counterpart of reconcilePhaseCheckboxes. Flips [x] → [ ] for test-spec, implementation, and review checkboxes using the existing kind-specific markers (IMPL_MARKER_BY_KIND, etc). - restartFeatureFromOriginIssues accepts optional `phases: Phase[]` arg. When the rewound phase was `committed` AND phases is passed, un-flips that phase's checkboxes. Older callers that omit phases (existing tests) get the pre-fix behavior — no breaking change. - Both production call sites (cli.ts:9875, cli.ts:10018) updated to pass phases (already in scope at both call sites). Markdown must follow state backward as faithfully as it follows forward. Tests: - plan-mutator.test.ts: 5 new tests for unflipPhaseCheckboxes covering all-three-flip, skip-test-spec, idempotent, kind-specific markers (writing), and the round-trip symmetry with reconcilePhaseCheckboxes. - cli.test.ts: 2 new tests for restartFeatureFromOriginIssues — one proves the un-flip fires on a real temp plan file when phases is passed (and that non-rewound phases are NOT touched), one proves omitting phases preserves pre-fix behavior for legacy callers. All 52 plan-mutator tests pass. All 4 restartFeatureFromOriginIssues tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h --commit-dirty / --force-dirty Closes the recovery anti-pattern that all four 2026-05-18 mitosis PREMATURE_COMPLETION faults triggered: --mark-phase-committed silently force-marked over a dirty worktree, leaving the next phase to start on inconsistent state. The mitosis-prototype-v3.1 fault report's "Recovery Performed" section shows the failure shape clearly: --mark-phase-committed 2.1 advanced the state machine but left 5+ files Codex had written during QA uncommitted. The next feature's worktree-clean check would see those files but the state would say the previous feature was committed — ambiguous on whose work it was. This is orthogonal to PR #41's auto-split path. Auto-split runs inside the gate flow when the agent exits 0; --mark-phase-committed is the operator's emergency recovery tool used AFTER the gate has already failed. Until now it had no dirty-tree opinion at all. New behavior: - Dirty worktree + no flag → REFUSE with exit 2, list dirty files, print the two recovery options. - --commit-dirty → `git add . && git commit -m "fix(recovery): <phase> auto-commit..."` then mark. Pre-commit hooks still run (we do NOT pass --no-verify; if a hook fails, the operator sees the hook output and can fall back to --force-dirty). - --force-dirty → emit WARN listing dirty files, mark anyway, leave the dirty state on disk. For when the operator has reviewed the dirty files and wants them preserved (e.g. WIP work they'll commit manually after the mark). - The two flags are mutually exclusive (clear error on conflict). - Clean tree → no-op, marks as before. - dryRun → skips the guard entirely (preview must not inspect git). - cwd omitted → skips the guard (preserves backwards compat for legacy callers / unit tests that exercise the state-only transition without a real git fixture). Implementation: - New optional args on markPhaseCommittedAfterManualRecovery: cwd, forceDirty, commitDirty. Refused-with-dirty-files response includes the structured dirtyFiles list so callers can render it. - CLI argv: new --force-dirty and --commit-dirty flags wired through. - --help text updated. - Production call site (cli.ts:9020) passes projectRoot as cwd and forwards both flags. Tests: - 7 new tests in the "markPhaseCommittedAfterManualRecovery > dirty- tree guard" describe block. Each test stands up a real git repo (`git init` + seed commit) so captureGitSnapshot has something real to inspect. Coverage: - refuses without a flag (verifies dirtyFiles in response + error message lists both recovery options + state is NOT mutated). - --force-dirty marks anyway (dirty file still on disk + uncommitted). - --commit-dirty stages + commits (dirty file in HEAD + recovery message present + dirty.txt no longer in `git status --porcelain`). - mutual exclusivity (clear error). - clean tree no-op (no flag needed). - cwd omitted skips guard (legacy callers). - dryRun skips guard (preview mode). All 259 cli.test.ts tests pass. All 1362 orchestrator tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The original plan I drafted earlier today was Explore-agent-summarized analysis. When I re-read the orchestrator code directly to start implementation, I found that: - All four 2026-05-18 faults pre-date PR #41 (squash-merged at 2026-05-18T13:37Z, last fault detected 12:26Z). - PR #41 was already a deliberate four-failures response to a different mitosis-oasis batch the same day, shipping most of what the original draft proposed (auto-split mixed test+prod diffs, producer-side layer-purity rule). - The original draft's Fix 1 premise was factually wrong: the fault reports asserted per-sub-step incremental checkbox flipping; the orchestrator has 5 flip sites total, all gated. That mechanism does not exist in shipped code. The actual mechanism: restartFeatureFromOriginIssues rewinds a committed phase to tests_green without un-flipping plan checkboxes. A subsequent re-run failure produces the PREMATURE_COMPLETION [x][x][x] + status=failed signature. Revised plan documents: - What's already shipped by PR #41 / PR #42 / PR #44 (don't re-implement). - Two narrow gaps that were real, both fixed on this branch: - Fix 1 (un-flip on rewind) — commit 381cf84. - Fix 4 (dirty-tree guard on --mark-phase-committed) — commit bfdd569. - Why the original draft's Fixes 2, 3, 5, 6 were descoped (better approaches already shipped, or speculative upstream issue). - What's still not addressed: defensive markFailed() guard against the broader committed→failed overwrite class (22 status="failed" sites in cli.ts, only one confirmed bridge). - Lessons: read code before designing fixes, fault reports are failure-shape artifacts not code-truth, open questions often have answers in types.ts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ship-time coverage audit on this branch flagged ~65% coverage (above 60% min, below 80% target). The two material gaps: 1. parseArgs flag→arg mapping for --commit-dirty / --force-dirty was uncovered (only the downstream behavior was tested). 2. unflipPhaseCheckboxes error-accumulation path was uncovered (only happy paths were tested). This commit adds: - 5 tests in cli.test.ts for the new flag wiring, mirroring the pattern from --skip-ship / --single-branch / --ship-on-plan-complete describe blocks: defaults both false, --force-dirty sets one, --commit-dirty sets the other, both can be set at parser level (the mutex check lives in markPhaseCommittedAfterManualRecovery), --help mentions both flags. - 1 test in plan-mutator.test.ts for unflipPhaseCheckboxes when setCheckboxState rejects a line (wrong expectedMarker, simulating hand-edited plan between parse and rewind). Verifies the function collects the error but continues un-flipping the remaining checkboxes, returning a partial result. Coverage now ~80% (target). All 11 new tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s errors Adversarial review finding (confidence 9/10): markPhaseCommittedAfterManualRecovery's dirty-tree guard was silently treating `git status` failures as "clean tree." When captureGitSnapshot can't read status (stale .git/index.lock, non-git directory, permissions issue), it encodes the failure as a single `<git error: ...>` line in snapshot.status. The guard filtered those out before counting dirty files, so the dirty list was empty → guard bypassed → silent force-mark over unknown worktree state. This recreates a narrower version of the exact silent-mark-over-dirty bug the guard was added to fix. Now: detect the `<git error: ...>` sentinel before filtering and return `ok: false` with a clear message naming the git error and the retry options. The check is bypassed only by --force-dirty, which already opts the operator into "I know the tree state may be unknown — mark anyway." Tests: - "fails closed when captureGitSnapshot reports a git error": points cwd at a non-git directory, verifies refusal with --force-dirty in the error message and state NOT advanced. - "--force-dirty bypasses git-error fail-closed": same fixture + --force-dirty, verifies the phase commits (operator explicitly accepted the unknown state). All 12 dirty-tree guard tests pass. All 319 cli + plan-mutator tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…estartFeatureFromOriginIssues Codex adversarial review (BLOCK verdict) caught two real holes in the rewind path that would have recreated the PREMATURE_COMPLETION bug class this branch claims to fix. Hole 1 (plan-mutator.ts): unflipPhaseCheckboxes did three SEPARATE atomic writes. If the test-spec flip succeeded, the impl flip failed marker validation, and the review flip succeeded, the plan was left half-rewound — test-spec back to [ ], impl still [x], review back to [ ]. That's silent corruption under exactly the hand-edit race the function exists to handle. Hole 2 (cli.ts:3465): restartFeatureFromOriginIssues only console.warn'd on un-flip errors but still advanced state (phase status: committed → tests_green). Result: state rewound, markdown still [x][x][x] → the exact PREMATURE_COMPLETION invariant the fault detector fires on. Fixes: plan-mutator.ts — unflipPhaseCheckboxes is now ALL-OR-NOTHING: 1. Validate every target line first (no writes). Collect errors. 2. If ANY validation error → return errors, plan markdown unchanged byte-for-byte. No partial state on disk. 3. If all valid → ONE atomic temp+rename write applies all flips together. Either every checkbox flips or none do. cli.ts:3465 — restartFeatureFromOriginIssues now FAILS CLOSED: - Run un-flip BEFORE advancing state. - If un-flip returns errors: pause the feature, set feature.error with the specific markdown drift details + manual recovery instructions, return `restarted: false`. - Phase status stays `committed` — the rewind never happened, so the markdown's [x][x][x] is still accurate. PREMATURE_COMPLETION cannot fire. - Only after the markdown is successfully rewound does state advance to tests_green. Tests: plan-mutator.test.ts — replaced the old "collects errors without throwing" test with "ATOMIC: when any target fails validation, NO checkbox is flipped (all-or-nothing)". Asserts: - errors.length === 1 - unflipped === 0 (was 1 in the old broken behavior) - plan markdown is byte-for-byte unchanged cli.test.ts — new test "FAIL-CLOSED: pauses feature when unflipPhaseCheckboxes errors (no state advance)". Sets up a plan where the impl marker was hand-renamed mid-flight, calls restart, asserts: - restarted === false - feature.status === "paused" - feature.error contains "plan markdown drift" and the phase number - state.phases[1].status STAYS "committed" (no rewind) - plan markdown is byte-for-byte unchanged All 53 plan-mutator tests pass. All 267 cli tests pass. Codex adversarial review verdict: BLOCK (correctly). After this commit: PREMATURE_COMPLETION race is closed for real. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fork-local skill release per CLAUDE.md "Fork versioning rule": no top-level VERSION or package.json bump (those track upstream gstack at v1.40.4.0). This branch's changes are scoped entirely to the /build skill orchestrator (plan-mutator.ts un-flip, cli.ts dirty-tree guard + restart fail-closed), so they get their own skill-local version bump. MINOR bump (1.23.0 → 1.24.0) because: - New public capability: unflipPhaseCheckboxes (atomic rewind helper) - New public capability: --commit-dirty / --force-dirty flags - New defensive behavior: fail-closed on restart un-flip errors - New defensive behavior: fail-closed on git status errors during dirty-guard All defensive, no breaking changes (existing callers that don't pass the new optional args get the pre-fix behavior). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n-rewind Sync orchestrator-state-machine.md, build/orchestrator/README.md, and the /build SKILL template to match what shipped in fix/skill-faults-2026-05-19: - §6 hygiene gate: cross-reference the new recovery-path dirty-tree guard - §6.6 new section: restartFeatureFromOriginIssues atomic un-flip contract, fail-closed semantics, why markdown must rewind in lockstep with state - §8 markPhaseCommittedAfterManualRecovery: dirty-tree guard policy table (clean / no-policy / commit-dirty / force-dirty / both / git status error) - §9 glossary: refresh stale cli.ts line numbers, add unflipPhaseCheckboxes and restartFeatureFromOriginIssues pointers - build/orchestrator/README.md: three new failure-modes table rows; new "Manual phase recovery (--mark-phase-committed)" section explaining when to use --commit-dirty vs --force-dirty - build/SKILL.md.tmpl: short paragraph surfacing the dirty-tree guard so agents invoking --mark-phase-committed see the new safety inline - build/SKILL.md: regenerated from template (bun run gen:skill-docs) No CHANGELOG / VERSION bump (fork-local rule; /build skill frontmatter already at 1.24.0). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the PREMATURE_COMPLETION skill-fault class observed in four 2026-05-18 mitosis incident reports, plus hardens the manual-recovery path that the same incident reports relied on.
The original investigation reports asserted "checkboxes are written incrementally per sub-step." That mechanism does not exist in shipped code (5 flip sites in cli.ts, all gated). Re-investigation under this branch found the actual mechanism:
restartFeatureFromOriginIssuesrewinds acommittedphase totests_greento re-run review, but did NOT un-flip the plan markdown checkboxes. A subsequent re-run failure → status=failed + checkboxes=[x][x][x] → PREMATURE_COMPLETION fires. Plus three of the four faults were rescued by--mark-phase-committedsilently force-marking over a dirty worktree, leaving the next phase to start on inconsistent state.Three layers of fix (all defensive, all backwards-compatible):
Atomic
unflipPhaseCheckboxes(plan-mutator.ts) — the symmetric counterpart ofreconcilePhaseCheckboxes. Validates all targets first, then ONE atomic temp+rename write. ANY validation error → no writes, errors returned, plan markdown unchanged byte-for-byte. Markdown follows state backward as faithfully as it follows forward.Fail-closed
restartFeatureFromOriginIssues(cli.ts) — run un-flip BEFORE advancing state. If un-flip returns errors, pause the feature withplan markdown driftreason and explicit manual-recovery instructions. Phase status STAYScommitted; the rewind never happens, so the markdown's [x][x][x] is still accurate. PREMATURE_COMPLETION cannot fire.--commit-dirty/--force-dirtyflags on--mark-phase-committed(cli.ts) — refuses to mark a phase committed when the worktree is dirty unless the operator explicitly chose a policy. Plus fail-closed whengit statusitself errors (e.g. stale.git/index.lock), bypassed only by--force-dirty.Reviews
1a9e7f98).c24a0e01Test Coverage
Coverage audit (ship-time subagent): 65% → ~80% after closing two gaps surfaced by the audit (parseArgs flag wiring, unflipPhaseCheckboxes error-accumulation path).
Tests: 1373 pass / 0 fail across 56 orchestrator test files (was 1304 before this branch).
Pre-Landing Review
5 findings, 1 auto-fixed (silent-pass-on-git-error), 4 deferred as known limitations (TOCTOU between dirty snapshot and flip step is a narrow window — recovery still better than the unguarded baseline). Quality score: 9/10.
Scope Drift
Plan Completion
2/2 DONE (Fix 1 =
381cf844, Fix 4 =bfdd569e), 0 NOT DONE, 0 UNVERIFIABLE. The plan also names 3 items explicitly out of scope (defensive markFailed across 22 sites, Gemini smoke-test gate, fault-report falsifiability lesson).Verification Results
Plan verification skipped — no dev server / not a webapp.
TODOS
No TODO items completed in this PR (no TODOS.md entries matched the diff).
Documentation
Docs synced to match what shipped (commit
7e325b5):docs/orchestrator-state-machine.md— §6 hygiene gate now cross-references the recovery-path dirty-tree guard; new §6.6 documentsrestartFeatureFromOriginIssues's atomic un-flip + fail-closed contract; §8markPhaseCommittedAfterManualRecoveryinvariants gain the dirty-tree guard policy table; §9 glossary refreshed with current cli.ts line numbers.build/orchestrator/README.md— Failure-modes table gains three rows for the new guards. New Manual phase recovery section explains when to use--commit-dirtyvs--force-dirty.build/SKILL.md.tmpl— Short "Dirty-tree guard" paragraph inline next to existing--mark-phase-committedguidance.build/SKILL.md— Regenerated viabun run gen:skill-docs.No CHANGELOG / top-level VERSION bump (fork-local rule per CLAUDE.md;
/buildskill frontmatter bumped 1.23.0 → 1.24.0 in commit87264247).Test plan
bun test build/orchestrator/__tests__/plan-mutator.test.ts→ 53 pass / 0 failbun test build/orchestrator/__tests__/cli.test.ts→ 267 pass / 0 failbun test build/orchestrator/__tests__/→ 1373 pass / 0 fail across 56 filesbun test test/skill-validation.test.ts test/gen-skill-docs.test.ts→ 718 pass / 0 fail (afterbun run gen:skill-docs --host all)bun test(full free suite) → exit 0🤖 Generated with Claude Code