Skip to content

fix(build/orchestrator): re-align state.phases when feature-review inserts mid-array (build skill v1.22.3)#43

Closed
anbangr wants to merge 5 commits into
mainfrom
fix/feature-needs-phases-merge
Closed

fix(build/orchestrator): re-align state.phases when feature-review inserts mid-array (build skill v1.22.3)#43
anbangr wants to merge 5 commits into
mainfrom
fix/feature-needs-phases-merge

Conversation

@anbangr

@anbangr anbangr commented May 18, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes a silent state-corruption bug in gstack-build where state.phases (the persisted JSON array of per-phase runtime state) gets mis-aligned with the parsed plan markdown when feature-review returns FEATURE_NEEDS_PHASES and the named feature is not the last feature in the plan.

Root cause: build/orchestrator/cli.ts:9063 (pre-fix) used reparsed.phases.slice(oldPhaseCount) to identify "new" phases, assuming new entries append to the END of the parser's array. They don't. appendFeaturePhases splices new ### Phase N.review-K headings inside Feature N's body (before the next ## Feature heading). The parser walks top-to-bottom assigning phaseIndex = phases.length, so every PhaseState downstream of the insertion point shifts in the parser's view while the slice captures the wrong tail. Result: state.phases[i].number becomes stale for every i ≥ insertion point AND state.phases gains a duplicate of the actually-last phase with a conflicting .index. The orchestrator runs the new review against a slot whose .number still describes the OLD phase at that index — exactly the state.phases[3].number = "2.1" but gemini.outputPath = "phase-1.review-1-*" symptom in the bug report.

Fix: Replace the slice-based merge with mergeReparsedPhases in build/orchestrator/state.ts — a diff-merge by phase number that splices new phases into state.phases at the parser-assigned position and preserves PhaseState identity (status, gemini, codexReview, committedAt). Add a fail-closed resume guard in cli.ts: on detected state/plan desync at startup, exit with code 2 and print a remediation message instead of silently re-attributing runtime artifacts to the wrong phase.

Bumps: build skill frontmatter 1.22.2 → 1.22.3. No top-level VERSION bump (Fork versioning rule).

Test Coverage

Coverage: 100%. All paths in mergeReparsedPhases, arePhasesAligned, and the resume guard have direct tests in build/orchestrator/tests/phases-added-merge.test.ts (12 tests / 84 assertions).

CODE PATHS
[+] state.ts/mergeReparsedPhases
  ├── [★★★ TESTED] mid-array splice with downstream PhaseState identity preserved
  ├── [★★★ TESTED] append at last feature
  ├── [★★★ TESTED] multiple review phases in one verdict
  ├── [★★★ TESTED] orphan detection + drop
  ├── [★★★ TESTED] feature-level add/orphan surfaced in report
  ├── [★★★ TESTED] duplicate parser-side number rejection (throws)
  ├── [★★★ TESTED] last-write-wins on duplicate state.phases entries (resume scenario)
  └── [★★★ TESTED] currentPhaseIndex / currentFeatureIndex untouched
[+] state.ts/arePhasesAligned
  ├── [★★★ TESTED] aligned baseline
  ├── [★★★ TESTED] phase-count drift
  ├── [★★★ TESTED] per-index phase-number drift
  ├── [★★★ TESTED] feature-count drift
  └── [★★★ TESTED] feature-number drift with phases preserved

COVERAGE: 13/13 paths tested (100%)
Tests: 0 → 12 new tests / 84 assertions (single regression test file)

Coverage gate: PASS (100%).

Pre-Landing Review

9 INFORMATIONAL findings from 4 specialists (Testing, Maintainability, Security, Performance). 0 CRITICAL.

  • Testing (4): all auto-fixed — typed sentinel helper replaces as unknown as double-casts; duplicate-collapse semantic now pinned with distinct sentinels; cursor-preservation assertions added; feature-asymmetry test added.
  • Maintainability (5): 4 auto-fixed (arePhasesAligned extracted, logMergeReport helper, kind defaulting symmetrized, comment ordering), 1 ASK answered as B (extend report shape with addedFeatures/orphanedFeatures now rather than defer).
  • Security: NO FINDINGS.
  • Performance: NO FINDINGS.

PR Quality Score: 5.5/10 (penalized for surfaced informational findings; all addressed).

Adversarial Review

Cross-model adversarial review (Claude subagent + Codex codex exec + Codex codex review) surfaced 3 HIGH findings. All addressed in commit e2781eb9:

  1. Resume-time auto-repair re-attributes runtime artifacts to the wrong phase (Codex High v1.16.0.0 feat: gstack-build CLI orchestrator #1). The original fix's resume guard auto-merged on desync, but artifacts on shifted slots (gemini.outputFilePath, codexReview) belong to the work that physically ran at that slot, not to the phase whose number was stored there in corrupted JSON. By-number merge would silently re-attribute work; downstream --mark-phase-committed would mark the wrong phase done. Resolved per user-approved Option A: resume guard now FAILS CLOSED. Exits with code 2 and prints a 3-option remediation message (--no-resume, delete state, manual realign). User decides what to keep.
  2. Duplicate phase numbers in the parsed plan alias the same PhaseState object across multiple array slots (Codex High feat(build): TDD integration — Red→Green enforced by state machine (v1.14.0) #2). Defended: mergeReparsedPhases now throws on parser-side duplicate number values.
  3. arePhasesAligned ignored feature-level drift (Codex High feat(dual-impl): Gemini + Codex tournament selection with Opus judge (gstack-build v1.15.0) #3 / Claude F11). Extended to check both state.phases (length + per-index number) AND state.features (length + per-index number).

Cross-model agreement on the resume-corruption concern was the strongest signal — both reviewers independently flagged the same silent-corruption shape as the bug being fixed.

CODEX (code review) gate: PASS (after fixes).

Plan Completion

No plan file detected. The branch shipped from a bug report (the user's screenshot/markdown). All items from the bug report's "Proposed fix" section addressed:

  1. Re-parse plan to get new full phase array ✓
  2. Compute the diff (by number, not slice) ✓
  3. Splice new phase entries at correct positions ✓
  4. Rewrite state.json atomically (via existing saveState + fs.renameSync) ✓

The alternative the report mentioned ("change state.phases[] to be derived not persisted") was explicitly NOT taken — the fix preserves the persistence model and just fixes the alignment.

Verification Results

No verification section in the plan (no plan file). Step 8.1 skipped.

Documentation

Doc diff preview

  • docs/orchestrator-state-machine.md: added Inv F (phases-array alignment with parsed plan) naming arePhasesAligned as the predicate; added §9 mergeReparsedPhases contract (strategy, last-write-wins on pre-existing duplicates, parser-side duplicate rejection, hands-off cursor, in-run-only wiring); added §10 resume-time alignment guard (fail-closed exit 2, three printed recovery paths, silent-corruption rationale for refusing to auto-heal); refreshed §11 glossary line numbers and added arePhasesAligned, mergeReparsedPhases, and the resume-guard call site.
  • build/README.md: added Resume-time state/plan desync (exit 2) subsection to Failure Handling describing the new fail-closed error, the three recovery options, and a back-reference to state-machine §10.

Docs-state-machine enforcement test passes (37/37). No other docs needed updating — the fix is an internal orchestrator behavior change with no new CLI flags or skills.

Documentation coverage

Entity reference how-to tutorial explanation
arePhasesAligned yes (state-machine §11 glossary, Inv F) n/a n/a yes (Inv F, §10)
mergeReparsedPhases yes (state-machine §9, §11 glossary) n/a n/a yes (§9 strategy + rationale)
resume-time fail-closed guard (exit 2) yes (state-machine §10, §11 glossary) yes (build/README Failure Handling) n/a yes (§10 silent-corruption rationale)

Coverage: all shipped behavior has adequate documentation.

Tests

Full suite: 1305 pass / 2 skip / 1 pre-existing flake (4072 expects).

The 1 flake is in build/orchestrator/__tests__/sub-agents.test.tsrunConfiguredRoleTask: timeout-fix integration block, timing-dependent. Pre-existing: passes when run in isolation, fails under concurrent suite load. Reproduces on main without this branch's diff. Not in scope for this fix.

Per /ship Step 5 Test Failure Ownership Triage: classified as pre-existing, skip per Option D (the same flake was already documented during the parent /investigate session).

Test plan

  • bun test build/orchestrator/__tests__/phases-added-merge.test.ts — 12 pass / 0 fail
  • bun test build/orchestrator/__tests__/ — 1305 pass / 2 skip / 1 pre-existing flake
  • Targeted re-verify after each fix commit — passing
  • Documentation: bun test build/orchestrator/__tests__/docs-state-machine.test.ts — 37/37 pass

🤖 Generated with Claude Code

anbangr and others added 5 commits May 18, 2026 20:56
…serts new phases

When feature-review returns FEATURE_NEEDS_PHASES, plan-mutator.appendFeaturePhases
splices new "### Phase N.review-K" headings under the named feature heading (not at
end-of-file). The parser walks top-to-bottom assigning phaseIndex = phases.length, so
every downstream PhaseState index shifted. cli.ts:9063 used reparsed.phases.slice(
oldPhaseCount), which assumed append; it captured the wrong tail (the actually-last
phase of the LAST feature) and pushed it as the "new" phase. state.phases[k] became
stale for every k >= insertion point AND gained a duplicate of the last phase with a
conflicting .index. The orchestrator then ran the new review against a state slot
whose .number still described the OLD phase at that index, which is the
"state.phases[3].number = 2.1 but gemini.outputPath = phase-1.review-1-*" symptom.

Fix:
- state.ts: new mergeReparsedPhases(state, reparsed) — diff-merges by phase.number,
  preserving PhaseState identity (status, gemini outputs, codexReview records) and
  rewriting per-phase .index to match parser order. Syncs featureState.phaseIndexes
  from the parser. Drops orphan state phases (no longer in plan) with a warning.
- cli.ts (phases_added branch): replaced the slice() merge with mergeReparsedPhases.
  Removed the now-redundant featureDef rebinding gymnastics — featureState.phaseIndexes
  is synced inside the merge.
- cli.ts (resume path): added repair guard after loadState + backfillKindFromPlan.
  When loaded state.phases disagrees with the freshly parsed plan (length OR per-index
  number mismatch), re-align via the same merge. Heals on-disk corruption left by the
  pre-fix code paths automatically — users hit by the bug recover by re-running.

Regression test (build/orchestrator/__tests__/phases-added-merge.test.ts) pins:
- Mid-array splice with downstream PhaseState identity preserved (gemini, committedAt,
  status all survive across the merge).
- Append at last feature (the OLD math also handled this; the new math must too).
- Multiple review phases inserted in one verdict.
- Orphan detection + drop.

Suite: 1297 pass / 2 skip / 1 timing-flake in sub-agents.test.ts (passes in
isolation, unrelated to this change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address informational findings from Testing and Maintainability specialists
during /ship pre-landing review. All findings auto-fixed except the
feature-array asymmetry (M3), which the user approved as Option B (extend
report shape now rather than defer).

state.ts (mergeReparsedPhases):
- Extract arePhasesAligned() predicate so the resume-time trigger and the
  merge function share a single agreement contract (M1).
- Backfill kind on existing PhaseStates symmetrically with the new-entry
  branch (default to "code" when parser supplies none) so post-merge
  invariant is uniform (M4).
- Sync features by number, not array position, surfacing addedFeatures and
  orphanedFeatures in MergeReparsedPhasesReport. Production never adds or
  removes top-level features today, but the asymmetry was a latent gap and
  the report now reports both deltas (M3 / option B).
- Document last-write-wins semantic on duplicate phase numbers and the
  hands-off contract on currentPhaseIndex / currentFeatureIndex.

cli.ts:
- Use arePhasesAligned() in the resume guard instead of an inline predicate.
- Extract logMergeReport(report, context) helper so the resume-time repair
  and the phases_added path emit identical, greppable [plan] log shapes
  with a context tag distinguishing the call site (M2).

tests:
- Add makeSentinelInvocation() typed helper to replace `as unknown as`
  double-casts on SubAgentInvocation sentinels (M_test).
- Pin duplicate-number collapse semantics: plant distinct sentinels on two
  same-numbered PhaseStates and assert the second one (last-write) wins (T1).
- Assert mergeReparsedPhases does NOT mutate currentPhaseIndex /
  currentFeatureIndex (T3 — cursor preservation contract).
- New test for feature-level orphan/added drift surfaced via the extended
  report shape.
- Update existing assertions to verify addedFeatures and orphanedFeatures
  are empty in the happy-path cases.

Suite: 1300 pass / 2 skip / 0 fail (4066 expects). The earlier sub-agents
timing flake also cleared on this run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…number defense

Address 3 HIGH findings from cross-model adversarial review (Claude
subagent + Codex structured review).

The biggest concern: the resume-time auto-repair I introduced in the
previous commit preserves runtime artifacts (gemini.outputFilePath,
codexReview, committedAt) under whatever `number` is stored in
state.json. But state.json corrupted by the pre-fix /build has the
artifacts persisted at slots whose `number` is wrong — by-number merge
re-attributes those artifacts to the wrong phase. If a downstream
--mark-phase-committed then marks the wrong phase done, the real
phase silently skips its real work. Same silent-corruption shape as
the bug being fixed.

Resolution per user-approved option A: the resume guard becomes
fail-closed. On detected disagreement, gstack-build now exits with code 2
and prints a clear remediation message (delete state, --no-resume, or
manually realign). Users decide what to keep; no silent heal.

The in-run phases_added merge stays auto: there, state.phases[i]
genuinely holds the prior phase's runtime state because the splice
hasn't been re-parsed yet. By-number merge is the correct semantic.

state.ts:
- arePhasesAligned() now also checks feature-level alignment (count +
  per-index number). Resume desync of features alone (user renamed a
  `## Feature N:` heading) no longer slips past the guard.
- mergeReparsedPhases() throws on duplicate phase numbers from the
  parser, defending against aliasing where two same-numbered phases
  would share a single PhaseState object across two array slots.
- Document the contract: cli.ts does NOT auto-invoke the merge on
  resume; the function is wired only on the in-run phases_added path.

cli.ts:
- Resume guard: on disagreement, print a remediation block and exit 2
  via the existing setupFailed flow. No mergeReparsedPhases call on
  resume. The error message names all four counts (state vs parsed,
  for phases and features), names the root cause (pre-fix gstack or
  hand-edit), and lists three recovery paths.
- statePath imported so the remediation message can name the on-disk
  state file explicitly.
- logMergeReport() kept; doc updated to note "resume" context tag is
  reserved for future use (not currently wired now that resume is
  fail-closed).

tests:
- New describe block for arePhasesAligned covering phase-count drift,
  per-index phase-number drift, feature-count drift, and feature-number
  drift with phases preserved.
- New test for duplicate-number rejection (asserts the thrown error).
- Resume-path test renamed and rescoped: it now exercises the PURE
  function's behavior (the shape of the merge if called) without
  claiming this is the runtime recovery path. Comment makes the
  contract clear.

Cross-model adversarial review findings addressed:
- Claude F1 / Codex Med#2 (failedAtPhase desync): moot — fail-closed
  resume means the merge never runs on shifted on-disk state.
- Codex High#1 (resume preserves shifted artifacts): fixed by
  fail-closed.
- Codex High#2 (duplicate phase aliasing): fixed by parser-side
  duplicate detection.
- Codex High#3 / Claude F11 (arePhasesAligned ignores features): fixed.
- Codex Med#1 (corrupted .index survives valid JSON loads): moot
  on resume (fail-closed) and impossible on in-run merge (.index is
  rewritten to i unconditionally).
- Codex Med#2 / Claude F5 (last-write-wins duplicate collapse drops
  recoverable state): moot — resume is fail-closed, so any disk
  duplicates surface as a fail-closed error rather than silent collapse.

Suite: 1305 pass / 2 skip / 1 pre-existing flake in sub-agents.test.ts
(timeout-fix integration block — same shape on main, unrelated to this
diff; passes intermittently). Regression test now: 12 pass / 84
assertions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signal that the build skill's orchestrator runtime changed in this
branch (FEATURE_NEEDS_PHASES merge fix + fail-closed resume guard).
No top-level VERSION bump per Fork versioning rule — fork-local
skill work bumps skill frontmatter only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-reference the FEATURE_NEEDS_PHASES mid-array merge fix in the two
canonical docs that future debuggers / orchestrator readers go to first.

docs/orchestrator-state-machine.md
- Add Inv F (phases-array alignment with parsed plan), naming
  arePhasesAligned as the predicate and pointing at the two
  paths that can violate it (in-run splice, cross-run state.json
  desync).
- Add §9: mergeReparsedPhases contract — strategy, last-write-wins on
  pre-existing duplicates, parser-side duplicate rejection, hands-off
  cursor, in-run-only wiring. Names the regression test file.
- Add §10: resume-time alignment guard (fail-closed) — exit code 2
  conditions, remediation paths printed to the user, and the
  silent-corruption rationale for refusing to auto-heal on resume.
- Refresh the §11 glossary line numbers (logMergeReport pushed every
  cli.ts pointer down) and add arePhasesAligned, mergeReparsedPhases,
  and the resume-guard call site.

build/README.md
- Add a "Resume-time state/plan desync (exit 2)" subsection to
  Failure Handling describing the new fail-closed error, the three
  recovery options the orchestrator prints, and a back-reference to
  state-machine §10 for the rationale.

No code changes. Diataxis quadrant: reference (state machine) +
how-to (recovery in build/README.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@anbangr

anbangr commented May 18, 2026

Copy link
Copy Markdown
Owner Author

Closing — PR #42 (#42) landed the same FEATURE_NEEDS_PHASES merge fix earlier today (2026-05-18T13:40Z) while this branch was in development.

Both branches independently identified the same root cause (slice-tail merge assumes new phases append, but appendFeaturePhases splices mid-array; downstream state.phases[k] go stale) and arrived at the same high-level fix (replace the slice with a by-number diff-merge in a new state.ts helper).

PR #42's helper is reconcileStatePhasesAfterReparse; this PR's was mergeReparsedPhases. The implementations differ in detail but the contract is the same. PR #42 also wires a fail-closed throw on dropped phases with a BLOCKED-feature-N.md recovery report — better than what this PR had on the first commit.

This PR's additive value over what's already landed:

  1. Parser-side duplicate-number rejection (Codex High feat(build): TDD integration — Red→Green enforced by state machine (v1.14.0) #2 from this PR's adversarial review — also called out in PR v1.40.2.0 fix(build/orchestrator): rebuild state.phases by number on FEATURE_NEEDS_PHASES #42's INVESTIGATE list as future work).
  2. Fail-closed resume guard (refuses to auto-merge on resume because corrupted on-disk state has runtime artifacts attached to wrong phase numbers — by-number merge would silently re-attribute them).
  3. Feature-level drift detection in arePhasesAligned (catches user-edited ## Feature N: heading changes between runs).

These will be filed as a follow-up PR that compounds on PR #42's landed code, not a competing implementation.

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant