Skip to content

fix(build/orchestrator): close PREMATURE_COMPLETION race + harden --mark-phase-committed dirty-tree recovery#48

Merged
anbangr merged 10 commits into
mainfrom
fix/skill-faults-2026-05-19
May 19, 2026
Merged

fix(build/orchestrator): close PREMATURE_COMPLETION race + harden --mark-phase-committed dirty-tree recovery#48
anbangr merged 10 commits into
mainfrom
fix/skill-faults-2026-05-19

Conversation

@anbangr

@anbangr anbangr commented May 19, 2026

Copy link
Copy Markdown
Owner

Summary

Closes the PREMATURE_COMPLETION skill-fault class observed in four 2026-05-18 mitosis incident reports, plus hardens the manual-recovery path that the same incident reports relied on.

The original investigation reports asserted "checkboxes are written incrementally per sub-step." That mechanism does not exist in shipped code (5 flip sites in cli.ts, all gated). Re-investigation under this branch found the actual mechanism: restartFeatureFromOriginIssues rewinds a committed phase to tests_green to re-run review, but did NOT un-flip the plan markdown checkboxes. A subsequent re-run failure → status=failed + checkboxes=[x][x][x] → PREMATURE_COMPLETION fires. Plus three of the four faults were rescued by --mark-phase-committed silently force-marking over a dirty worktree, leaving the next phase to start on inconsistent state.

Three layers of fix (all defensive, all backwards-compatible):

  • Atomic unflipPhaseCheckboxes (plan-mutator.ts) — the symmetric counterpart of reconcilePhaseCheckboxes. Validates all targets first, then ONE atomic temp+rename write. ANY validation error → no writes, errors returned, plan markdown unchanged byte-for-byte. Markdown follows state backward as faithfully as it follows forward.

  • Fail-closed restartFeatureFromOriginIssues (cli.ts) — run un-flip BEFORE advancing state. If un-flip returns errors, pause the feature with plan markdown drift reason and explicit manual-recovery instructions. Phase status STAYS committed; the rewind never happens, so the markdown's [x][x][x] is still accurate. PREMATURE_COMPLETION cannot fire.

  • --commit-dirty / --force-dirty flags on --mark-phase-committed (cli.ts) — refuses to mark a phase committed when the worktree is dirty unless the operator explicitly chose a policy. Plus fail-closed when git status itself errors (e.g. stale .git/index.lock), bypassed only by --force-dirty.

Reviews

Review Verdict Notes
Internal combined review (lens: structural + maintainability + adversarial) CLEAR (1 finding auto-fixed) Caught the silent-pass-on-git-status-error case (fixed in 1a9e7f98).
Codex adversarial challenge (read-only, model_reasoning_effort=high) BLOCK on first pass → CLEAR after c24a0e01 Codex correctly identified that the original un-flip wrapper console.warn'd on error but still advanced state, recreating the PREMATURE_COMPLETION bug class. This was the killer finding. Fixed: atomic un-flip + fail-closed restart.
Existing review log No prior reviews on this branch Ship ran the full pre-landing review pipeline.

Test Coverage

Coverage audit (ship-time subagent): 65% → ~80% after closing two gaps surfaced by the audit (parseArgs flag wiring, unflipPhaseCheckboxes error-accumulation path).

unflipPhaseCheckboxes              ★★★ — 6 tests (atomicity + round-trip + idempotent + markers + error path)
restartFeatureFromOriginIssues     ★★★ — 5 tests (legacy callers + rewind + FAIL-CLOSED on drift)
markPhaseCommittedAfterManualRecovery dirty-guard
                                   ★★★ — 12 tests (7 happy paths + git-error fail-closed + --force-dirty bypass)
parseArgs --force-dirty/--commit-dirty
                                   ★★  — 5 tests (defaults, each flag, both, --help text)

Tests: 1373 pass / 0 fail across 56 orchestrator test files (was 1304 before this branch).

Pre-Landing Review

5 findings, 1 auto-fixed (silent-pass-on-git-error), 4 deferred as known limitations (TOCTOU between dirty snapshot and flip step is a narrow window — recovery still better than the unguarded baseline). Quality score: 9/10.

Scope Drift

Scope Check: CLEAN
Intent:    Fix the 4 PREMATURE_COMPLETION skill-fault reports from 2026-05-18
Delivered: Two narrow orchestrator fixes (un-flip on rewind, --mark-phase-committed
           dirty guard) + adversarial-review-driven hardening

Plan Completion

2/2 DONE (Fix 1 = 381cf844, Fix 4 = bfdd569e), 0 NOT DONE, 0 UNVERIFIABLE. The plan also names 3 items explicitly out of scope (defensive markFailed across 22 sites, Gemini smoke-test gate, fault-report falsifiability lesson).

Verification Results

Plan verification skipped — no dev server / not a webapp.

TODOS

No TODO items completed in this PR (no TODOS.md entries matched the diff).

Documentation

Docs synced to match what shipped (commit 7e325b5):

  • docs/orchestrator-state-machine.md — §6 hygiene gate now cross-references the recovery-path dirty-tree guard; new §6.6 documents restartFeatureFromOriginIssues's atomic un-flip + fail-closed contract; §8 markPhaseCommittedAfterManualRecovery invariants gain the dirty-tree guard policy table; §9 glossary refreshed with current cli.ts line numbers.
  • build/orchestrator/README.md — Failure-modes table gains three rows for the new guards. New Manual phase recovery section explains when to use --commit-dirty vs --force-dirty.
  • build/SKILL.md.tmpl — Short "Dirty-tree guard" paragraph inline next to existing --mark-phase-committed guidance.
  • build/SKILL.md — Regenerated via bun run gen:skill-docs.

No CHANGELOG / top-level VERSION bump (fork-local rule per CLAUDE.md; /build skill frontmatter bumped 1.23.0 → 1.24.0 in commit 87264247).

Test plan

  • bun test build/orchestrator/__tests__/plan-mutator.test.ts → 53 pass / 0 fail
  • bun test build/orchestrator/__tests__/cli.test.ts → 267 pass / 0 fail
  • bun test build/orchestrator/__tests__/ → 1373 pass / 0 fail across 56 files
  • bun test test/skill-validation.test.ts test/gen-skill-docs.test.ts → 718 pass / 0 fail (after bun run gen:skill-docs --host all)
  • bun test (full free suite) → exit 0
  • Inbox plan revised to reflect actual scope vs. what was already shipped by PR fix(build): four-failures root cause + adversarial hardening #41

🤖 Generated with Claude Code

anbangr and others added 10 commits May 19, 2026 07:48
…romOriginIssues rewinds a committed phase

Root cause of the 2026-05-18 mitosis PREMATURE_COMPLETION fault class.

When origin verification fails after a feature reaches phases_done,
restartFeatureFromOriginIssues (cli.ts:3385) rewinds the last
committed phase back to `tests_green` and re-points currentPhaseIndex
at it so the review/QA loop re-runs. The status rewind is intentional
and load-bearing — the re-run is how origin-plan gaps get patched.

But the plan markdown checkboxes (flipped by markCommitted via
flipPhaseCheckboxes at cli.ts:5835) were NOT un-flipped. If the re-run
then fails — any of the 22 `phaseState.status = "failed"` sites in
cli.ts fires — the phase ends up at status=failed with checkboxes
still [x][x][x]. That's the exact PREMATURE_COMPLETION invariant the
fault detector fires on (skill-fault-detector.ts:429).

The fault reports from 2026-05-18 (agnt2-prototype, mitosis-control-plane,
mitosis-prototype-v3.1) claimed the mechanism was per-sub-step incremental
checkbox flipping. That hypothesis was wrong — there are only 5 flip sites
in cli.ts and all are gated. The actual mechanism is the rewind-without-
un-flip race in the origin-verification restart path.

Fix:
- New unflipPhaseCheckboxes(planFile, phase) in plan-mutator.ts: the
  symmetric counterpart of reconcilePhaseCheckboxes. Flips [x] → [ ]
  for test-spec, implementation, and review checkboxes using the
  existing kind-specific markers (IMPL_MARKER_BY_KIND, etc).
- restartFeatureFromOriginIssues accepts optional `phases: Phase[]`
  arg. When the rewound phase was `committed` AND phases is passed,
  un-flips that phase's checkboxes. Older callers that omit phases
  (existing tests) get the pre-fix behavior — no breaking change.
- Both production call sites (cli.ts:9875, cli.ts:10018) updated to
  pass phases (already in scope at both call sites).

Markdown must follow state backward as faithfully as it follows forward.

Tests:
- plan-mutator.test.ts: 5 new tests for unflipPhaseCheckboxes covering
  all-three-flip, skip-test-spec, idempotent, kind-specific markers
  (writing), and the round-trip symmetry with reconcilePhaseCheckboxes.
- cli.test.ts: 2 new tests for restartFeatureFromOriginIssues — one
  proves the un-flip fires on a real temp plan file when phases is
  passed (and that non-rewound phases are NOT touched), one proves
  omitting phases preserves pre-fix behavior for legacy callers.

All 52 plan-mutator tests pass. All 4 restartFeatureFromOriginIssues
tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h --commit-dirty / --force-dirty

Closes the recovery anti-pattern that all four 2026-05-18 mitosis
PREMATURE_COMPLETION faults triggered: --mark-phase-committed
silently force-marked over a dirty worktree, leaving the next phase
to start on inconsistent state.

The mitosis-prototype-v3.1 fault report's "Recovery Performed" section
shows the failure shape clearly: --mark-phase-committed 2.1 advanced
the state machine but left 5+ files Codex had written during QA
uncommitted. The next feature's worktree-clean check would see those
files but the state would say the previous feature was committed —
ambiguous on whose work it was.

This is orthogonal to PR #41's auto-split path. Auto-split runs inside
the gate flow when the agent exits 0; --mark-phase-committed is the
operator's emergency recovery tool used AFTER the gate has already
failed. Until now it had no dirty-tree opinion at all.

New behavior:
- Dirty worktree + no flag → REFUSE with exit 2, list dirty files,
  print the two recovery options.
- --commit-dirty → `git add . && git commit -m "fix(recovery): <phase>
  auto-commit..."` then mark. Pre-commit hooks still run (we do NOT
  pass --no-verify; if a hook fails, the operator sees the hook output
  and can fall back to --force-dirty).
- --force-dirty → emit WARN listing dirty files, mark anyway, leave
  the dirty state on disk. For when the operator has reviewed the
  dirty files and wants them preserved (e.g. WIP work they'll commit
  manually after the mark).
- The two flags are mutually exclusive (clear error on conflict).
- Clean tree → no-op, marks as before.
- dryRun → skips the guard entirely (preview must not inspect git).
- cwd omitted → skips the guard (preserves backwards compat for
  legacy callers / unit tests that exercise the state-only transition
  without a real git fixture).

Implementation:
- New optional args on markPhaseCommittedAfterManualRecovery: cwd,
  forceDirty, commitDirty. Refused-with-dirty-files response includes
  the structured dirtyFiles list so callers can render it.
- CLI argv: new --force-dirty and --commit-dirty flags wired through.
- --help text updated.
- Production call site (cli.ts:9020) passes projectRoot as cwd and
  forwards both flags.

Tests:
- 7 new tests in the "markPhaseCommittedAfterManualRecovery > dirty-
  tree guard" describe block. Each test stands up a real git repo
  (`git init` + seed commit) so captureGitSnapshot has something real
  to inspect. Coverage:
  - refuses without a flag (verifies dirtyFiles in response + error
    message lists both recovery options + state is NOT mutated).
  - --force-dirty marks anyway (dirty file still on disk + uncommitted).
  - --commit-dirty stages + commits (dirty file in HEAD + recovery
    message present + dirty.txt no longer in `git status --porcelain`).
  - mutual exclusivity (clear error).
  - clean tree no-op (no flag needed).
  - cwd omitted skips guard (legacy callers).
  - dryRun skips guard (preview mode).

All 259 cli.test.ts tests pass. All 1362 orchestrator tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The original plan I drafted earlier today was Explore-agent-summarized
analysis. When I re-read the orchestrator code directly to start
implementation, I found that:

- All four 2026-05-18 faults pre-date PR #41 (squash-merged at
  2026-05-18T13:37Z, last fault detected 12:26Z).
- PR #41 was already a deliberate four-failures response to a different
  mitosis-oasis batch the same day, shipping most of what the original
  draft proposed (auto-split mixed test+prod diffs, producer-side
  layer-purity rule).
- The original draft's Fix 1 premise was factually wrong: the fault
  reports asserted per-sub-step incremental checkbox flipping; the
  orchestrator has 5 flip sites total, all gated. That mechanism does
  not exist in shipped code.

The actual mechanism: restartFeatureFromOriginIssues rewinds a
committed phase to tests_green without un-flipping plan checkboxes.
A subsequent re-run failure produces the PREMATURE_COMPLETION
[x][x][x] + status=failed signature.

Revised plan documents:
- What's already shipped by PR #41 / PR #42 / PR #44 (don't
  re-implement).
- Two narrow gaps that were real, both fixed on this branch:
  - Fix 1 (un-flip on rewind) — commit 381cf84.
  - Fix 4 (dirty-tree guard on --mark-phase-committed) — commit
    bfdd569.
- Why the original draft's Fixes 2, 3, 5, 6 were descoped (better
  approaches already shipped, or speculative upstream issue).
- What's still not addressed: defensive markFailed() guard against
  the broader committed→failed overwrite class (22 status="failed"
  sites in cli.ts, only one confirmed bridge).
- Lessons: read code before designing fixes, fault reports are
  failure-shape artifacts not code-truth, open questions often have
  answers in types.ts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The ship-time coverage audit on this branch flagged ~65% coverage
(above 60% min, below 80% target). The two material gaps:

1. parseArgs flag→arg mapping for --commit-dirty / --force-dirty was
   uncovered (only the downstream behavior was tested).
2. unflipPhaseCheckboxes error-accumulation path was uncovered (only
   happy paths were tested).

This commit adds:

- 5 tests in cli.test.ts for the new flag wiring, mirroring the
  pattern from --skip-ship / --single-branch / --ship-on-plan-complete
  describe blocks: defaults both false, --force-dirty sets one,
  --commit-dirty sets the other, both can be set at parser level
  (the mutex check lives in markPhaseCommittedAfterManualRecovery),
  --help mentions both flags.

- 1 test in plan-mutator.test.ts for unflipPhaseCheckboxes when
  setCheckboxState rejects a line (wrong expectedMarker, simulating
  hand-edited plan between parse and rewind). Verifies the function
  collects the error but continues un-flipping the remaining
  checkboxes, returning a partial result.

Coverage now ~80% (target). All 11 new tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s errors

Adversarial review finding (confidence 9/10):
markPhaseCommittedAfterManualRecovery's dirty-tree guard was
silently treating `git status` failures as "clean tree." When
captureGitSnapshot can't read status (stale .git/index.lock,
non-git directory, permissions issue), it encodes the failure
as a single `<git error: ...>` line in snapshot.status. The
guard filtered those out before counting dirty files, so the
dirty list was empty → guard bypassed → silent force-mark over
unknown worktree state. This recreates a narrower version of
the exact silent-mark-over-dirty bug the guard was added to fix.

Now: detect the `<git error: ...>` sentinel before filtering and
return `ok: false` with a clear message naming the git error and
the retry options. The check is bypassed only by --force-dirty,
which already opts the operator into "I know the tree state may
be unknown — mark anyway."

Tests:
- "fails closed when captureGitSnapshot reports a git error":
  points cwd at a non-git directory, verifies refusal with
  --force-dirty in the error message and state NOT advanced.
- "--force-dirty bypasses git-error fail-closed": same fixture
  + --force-dirty, verifies the phase commits (operator
  explicitly accepted the unknown state).

All 12 dirty-tree guard tests pass. All 319 cli + plan-mutator
tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…estartFeatureFromOriginIssues

Codex adversarial review (BLOCK verdict) caught two real holes in
the rewind path that would have recreated the PREMATURE_COMPLETION
bug class this branch claims to fix.

Hole 1 (plan-mutator.ts): unflipPhaseCheckboxes did three SEPARATE
atomic writes. If the test-spec flip succeeded, the impl flip failed
marker validation, and the review flip succeeded, the plan was left
half-rewound — test-spec back to [ ], impl still [x], review back
to [ ]. That's silent corruption under exactly the hand-edit race
the function exists to handle.

Hole 2 (cli.ts:3465): restartFeatureFromOriginIssues only console.warn'd
on un-flip errors but still advanced state (phase status: committed →
tests_green). Result: state rewound, markdown still [x][x][x] → the
exact PREMATURE_COMPLETION invariant the fault detector fires on.

Fixes:

plan-mutator.ts — unflipPhaseCheckboxes is now ALL-OR-NOTHING:
1. Validate every target line first (no writes). Collect errors.
2. If ANY validation error → return errors, plan markdown unchanged
   byte-for-byte. No partial state on disk.
3. If all valid → ONE atomic temp+rename write applies all flips
   together. Either every checkbox flips or none do.

cli.ts:3465 — restartFeatureFromOriginIssues now FAILS CLOSED:
- Run un-flip BEFORE advancing state.
- If un-flip returns errors: pause the feature, set feature.error
  with the specific markdown drift details + manual recovery
  instructions, return `restarted: false`.
- Phase status stays `committed` — the rewind never happened,
  so the markdown's [x][x][x] is still accurate. PREMATURE_COMPLETION
  cannot fire.
- Only after the markdown is successfully rewound does state advance
  to tests_green.

Tests:

plan-mutator.test.ts — replaced the old "collects errors without
throwing" test with "ATOMIC: when any target fails validation, NO
checkbox is flipped (all-or-nothing)". Asserts:
- errors.length === 1
- unflipped === 0 (was 1 in the old broken behavior)
- plan markdown is byte-for-byte unchanged

cli.test.ts — new test "FAIL-CLOSED: pauses feature when
unflipPhaseCheckboxes errors (no state advance)". Sets up a plan
where the impl marker was hand-renamed mid-flight, calls restart,
asserts:
- restarted === false
- feature.status === "paused"
- feature.error contains "plan markdown drift" and the phase number
- state.phases[1].status STAYS "committed" (no rewind)
- plan markdown is byte-for-byte unchanged

All 53 plan-mutator tests pass. All 267 cli tests pass.

Codex adversarial review verdict: BLOCK (correctly).
After this commit: PREMATURE_COMPLETION race is closed for real.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fork-local skill release per CLAUDE.md "Fork versioning rule":
no top-level VERSION or package.json bump (those track upstream
gstack at v1.40.4.0). This branch's changes are scoped entirely
to the /build skill orchestrator (plan-mutator.ts un-flip,
cli.ts dirty-tree guard + restart fail-closed), so they get
their own skill-local version bump.

MINOR bump (1.23.0 → 1.24.0) because:
- New public capability: unflipPhaseCheckboxes (atomic rewind helper)
- New public capability: --commit-dirty / --force-dirty flags
- New defensive behavior: fail-closed on restart un-flip errors
- New defensive behavior: fail-closed on git status errors during
  dirty-guard

All defensive, no breaking changes (existing callers that don't pass
the new optional args get the pre-fix behavior).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n-rewind

Sync orchestrator-state-machine.md, build/orchestrator/README.md, and the
/build SKILL template to match what shipped in fix/skill-faults-2026-05-19:

- §6 hygiene gate: cross-reference the new recovery-path dirty-tree guard
- §6.6 new section: restartFeatureFromOriginIssues atomic un-flip contract,
  fail-closed semantics, why markdown must rewind in lockstep with state
- §8 markPhaseCommittedAfterManualRecovery: dirty-tree guard policy table
  (clean / no-policy / commit-dirty / force-dirty / both / git status error)
- §9 glossary: refresh stale cli.ts line numbers, add unflipPhaseCheckboxes
  and restartFeatureFromOriginIssues pointers
- build/orchestrator/README.md: three new failure-modes table rows; new
  "Manual phase recovery (--mark-phase-committed)" section explaining when
  to use --commit-dirty vs --force-dirty
- build/SKILL.md.tmpl: short paragraph surfacing the dirty-tree guard so
  agents invoking --mark-phase-committed see the new safety inline
- build/SKILL.md: regenerated from template (bun run gen:skill-docs)

No CHANGELOG / VERSION bump (fork-local rule; /build skill frontmatter
already at 1.24.0).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@anbangr anbangr merged commit ea7cc32 into main May 19, 2026
@anbangr anbangr deleted the fix/skill-faults-2026-05-19 branch May 19, 2026 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant