Skip to content

Commit e2781eb

Browse files
anbangrclaude
andcommitted
fix: adversarial review fixes — fail-closed resume guard + duplicate-number defense
Address 3 HIGH findings from cross-model adversarial review (Claude subagent + Codex structured review). The biggest concern: the resume-time auto-repair I introduced in the previous commit preserves runtime artifacts (gemini.outputFilePath, codexReview, committedAt) under whatever `number` is stored in state.json. But state.json corrupted by the pre-fix /build has the artifacts persisted at slots whose `number` is wrong — by-number merge re-attributes those artifacts to the wrong phase. If a downstream --mark-phase-committed then marks the wrong phase done, the real phase silently skips its real work. Same silent-corruption shape as the bug being fixed. Resolution per user-approved option A: the resume guard becomes fail-closed. On detected disagreement, gstack-build now exits with code 2 and prints a clear remediation message (delete state, --no-resume, or manually realign). Users decide what to keep; no silent heal. The in-run phases_added merge stays auto: there, state.phases[i] genuinely holds the prior phase's runtime state because the splice hasn't been re-parsed yet. By-number merge is the correct semantic. state.ts: - arePhasesAligned() now also checks feature-level alignment (count + per-index number). Resume desync of features alone (user renamed a `## Feature N:` heading) no longer slips past the guard. - mergeReparsedPhases() throws on duplicate phase numbers from the parser, defending against aliasing where two same-numbered phases would share a single PhaseState object across two array slots. - Document the contract: cli.ts does NOT auto-invoke the merge on resume; the function is wired only on the in-run phases_added path. cli.ts: - Resume guard: on disagreement, print a remediation block and exit 2 via the existing setupFailed flow. No mergeReparsedPhases call on resume. The error message names all four counts (state vs parsed, for phases and features), names the root cause (pre-fix gstack or hand-edit), and lists three recovery paths. - statePath imported so the remediation message can name the on-disk state file explicitly. - logMergeReport() kept; doc updated to note "resume" context tag is reserved for future use (not currently wired now that resume is fail-closed). tests: - New describe block for arePhasesAligned covering phase-count drift, per-index phase-number drift, feature-count drift, and feature-number drift with phases preserved. - New test for duplicate-number rejection (asserts the thrown error). - Resume-path test renamed and rescoped: it now exercises the PURE function's behavior (the shape of the merge if called) without claiming this is the runtime recovery path. Comment makes the contract clear. Cross-model adversarial review findings addressed: - Claude F1 / Codex Med#2 (failedAtPhase desync): moot — fail-closed resume means the merge never runs on shifted on-disk state. - Codex High#1 (resume preserves shifted artifacts): fixed by fail-closed. - Codex High#2 (duplicate phase aliasing): fixed by parser-side duplicate detection. - Codex High#3 / Claude F11 (arePhasesAligned ignores features): fixed. - Codex Med#1 (corrupted .index survives valid JSON loads): moot on resume (fail-closed) and impossible on in-run merge (.index is rewritten to i unconditionally). - Codex Med#2 / Claude F5 (last-write-wins duplicate collapse drops recoverable state): moot — resume is fail-closed, so any disk duplicates surface as a fail-closed error rather than silent collapse. Suite: 1305 pass / 2 skip / 1 pre-existing flake in sub-agents.test.ts (timeout-fix integration block — same shape on main, unrelated to this diff; passes intermittently). Regression test now: 12 pass / 84 assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 502272c commit e2781eb

3 files changed

Lines changed: 282 additions & 41 deletions

File tree

build/orchestrator/__tests__/phases-added-merge.test.ts

Lines changed: 196 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ import * as fs from "node:fs";
2727
import * as path from "node:path";
2828
import { parsePlan } from "../parser";
2929
import { appendFeaturePhases, _testWritePlan } from "../plan-mutator";
30-
import { mergeReparsedPhases } from "../state";
30+
import { mergeReparsedPhases, arePhasesAligned } from "../state";
3131
import type { BuildState, PhaseState, SubAgentInvocation } from "../types";
3232

3333
/** Typed sentinel SubAgentInvocation for tests — avoids `as unknown as` double-casts. */
@@ -365,15 +365,19 @@ describe("mergeReparsedPhases — FEATURE_NEEDS_PHASES splice mid-array", () =>
365365
});
366366
});
367367

368-
describe("mergeReparsedPhases — resume-path repair of pre-fix on-disk state", () => {
368+
describe("mergeReparsedPhases — pure function behavior on pre-fix corrupted state", () => {
369369
// Simulates the exact corrupted-state shape the bug report describes:
370370
// * Plan markdown has Feature 1's review phase spliced in mid-array
371371
// * On-disk state.json (written by pre-fix gstack) has the OLD phase order
372372
// plus a duplicate of the actually-last phase pushed by slice(oldCount)
373-
// The cli.ts resume guard detects the disagreement and calls mergeReparsedPhases
374-
// to repair. This test pins that the repair restores invariants without losing
375-
// PhaseState content (status, committedAt, gemini outputs).
376-
it("repairs state.phases for a build started by the pre-fix code path", () => {
373+
//
374+
// IMPORTANT: cli.ts does NOT auto-invoke this merge on resume. The resume
375+
// guard is fail-closed (see cli.ts, "state/plan desync detected on resume")
376+
// because by-number merge would re-attribute runtime artifacts to the
377+
// wrong phase. This test exercises the pure function's behavior — the
378+
// shape of the merge if it WERE called — so future refactors can rely on
379+
// a known contract. It is NOT the runtime recovery path.
380+
it("rebuilds state.phases to parser order while preserving PhaseState identity (pure function contract)", () => {
377381
const desyncedMd = `# Plan
378382
379383
## Feature 1: Auth
@@ -648,4 +652,190 @@ describe("mergeReparsedPhases — resume-path repair of pre-fix on-disk state",
648652
expect(report.orphaned).toEqual(["2.1"]);
649653
expect(report.added).toEqual(["3.1"]);
650654
});
655+
656+
it("rejects duplicate phase numbers in the parsed plan rather than silently aliasing PhaseState", () => {
657+
// Two `### Phase 1.1` headings in the same plan would cause
658+
// `stateByNumber.get("1.1")` to return the same object for both array
659+
// slots, aliasing .index mutation and downstream status writes. The
660+
// parser today never produces duplicates, but a bug in appendFeaturePhases
661+
// or a hand-edited plan could. Defense: throw on duplicate parser-side
662+
// numbers so the failure mode is loud, not a silent state corruption.
663+
const initial = parsePlan(`# Plan
664+
665+
## Feature 1: A
666+
667+
### Phase 1.1: Original
668+
- [ ] **Implementation**: x
669+
- [ ] **Review**: y
670+
`);
671+
const state = buildStateFromParsed(initial);
672+
673+
// Hand-construct a malformed reparsed result with a duplicate number.
674+
// (Going through parsePlan + plan-mutator would not produce this; we
675+
// call the function directly with a crafted input to exercise the
676+
// defense.)
677+
const malformed = {
678+
phases: [
679+
...initial.phases,
680+
// Duplicate of "1.1" — should throw.
681+
{
682+
...initial.phases[0],
683+
index: 1,
684+
name: "Duplicate",
685+
},
686+
],
687+
features: initial.features,
688+
};
689+
690+
expect(() => mergeReparsedPhases(state, malformed)).toThrow(
691+
/duplicate phase number "1\.1"/,
692+
);
693+
});
694+
});
695+
696+
describe("arePhasesAligned — drift detection on resume", () => {
697+
// The resume-time fail-closed guard in cli.ts uses this predicate to
698+
// decide whether to abort with a remediation message. Both phase-level
699+
// AND feature-level drift must be detected — feature-only drift (e.g.,
700+
// user renames a feature heading) would otherwise leave state.features
701+
// stale.
702+
it("returns true when state and parsed plan agree on phases and features", () => {
703+
const md = `# Plan
704+
705+
## Feature 1: Auth
706+
707+
### Phase 1.1: x
708+
- [ ] **Implementation**: x
709+
- [ ] **Review**: y
710+
711+
## Feature 2: Billing
712+
713+
### Phase 2.1: x
714+
- [ ] **Implementation**: x
715+
- [ ] **Review**: y
716+
`;
717+
const parsed = parsePlan(md);
718+
const state = buildStateFromParsed(parsed);
719+
expect(arePhasesAligned(state, parsed)).toBe(true);
720+
});
721+
722+
it("returns false when phase count disagrees", () => {
723+
const initial = parsePlan(`# Plan
724+
725+
## Feature 1: A
726+
727+
### Phase 1.1: x
728+
- [ ] **Implementation**: x
729+
- [ ] **Review**: y
730+
`);
731+
const state = buildStateFromParsed(initial);
732+
const reparsed = parsePlan(`# Plan
733+
734+
## Feature 1: A
735+
736+
### Phase 1.1: x
737+
- [ ] **Implementation**: x
738+
- [ ] **Review**: y
739+
740+
### Phase 1.2: new
741+
- [ ] **Implementation**: x
742+
- [ ] **Review**: y
743+
`);
744+
expect(arePhasesAligned(state, reparsed)).toBe(false);
745+
});
746+
747+
it("returns false when per-index phase number disagrees (the original bug shape)", () => {
748+
const initial = parsePlan(`# Plan
749+
750+
## Feature 1: A
751+
752+
### Phase 1.1: x
753+
- [ ] **Implementation**: x
754+
- [ ] **Review**: y
755+
756+
## Feature 2: B
757+
758+
### Phase 2.1: x
759+
- [ ] **Implementation**: x
760+
- [ ] **Review**: y
761+
`);
762+
const state = buildStateFromParsed(initial);
763+
// Reparsed plan with a review phase spliced under Feature 1 — phase
764+
// count grew, AND per-index numbers shift starting at index 1.
765+
const reparsed = parsePlan(`# Plan
766+
767+
## Feature 1: A
768+
769+
### Phase 1.1: x
770+
- [ ] **Implementation**: x
771+
- [ ] **Review**: y
772+
773+
### Phase 1.review-1: new
774+
- [ ] **Implementation**: x
775+
- [ ] **Review**: y
776+
777+
## Feature 2: B
778+
779+
### Phase 2.1: x
780+
- [ ] **Implementation**: x
781+
- [ ] **Review**: y
782+
`);
783+
expect(arePhasesAligned(state, reparsed)).toBe(false);
784+
});
785+
786+
it("returns false when feature count disagrees", () => {
787+
const initial = parsePlan(`# Plan
788+
789+
## Feature 1: A
790+
791+
### Phase 1.1: x
792+
- [ ] **Implementation**: x
793+
- [ ] **Review**: y
794+
`);
795+
const state = buildStateFromParsed(initial);
796+
const reparsed = parsePlan(`# Plan
797+
798+
## Feature 1: A
799+
800+
### Phase 1.1: x
801+
- [ ] **Implementation**: x
802+
- [ ] **Review**: y
803+
804+
## Feature 2: New
805+
806+
### Phase 2.1: x
807+
- [ ] **Implementation**: x
808+
- [ ] **Review**: y
809+
`);
810+
expect(arePhasesAligned(state, reparsed)).toBe(false);
811+
});
812+
813+
it("returns false when feature numbers disagree even if counts match", () => {
814+
// User renamed/replaced a feature heading. Phase count and per-index
815+
// phase numbers can still match, but state.features carries stale
816+
// metadata that downstream gates would consult.
817+
const initial = parsePlan(`# Plan
818+
819+
## Feature 1: Original
820+
821+
### Phase 1.1: x
822+
- [ ] **Implementation**: x
823+
- [ ] **Review**: y
824+
`);
825+
const state = buildStateFromParsed(initial);
826+
// Rename Feature 1 to Feature 2 (different number, same phase
827+
// numbering preserved via a hand-edit pretend-scenario). parsePlan
828+
// would auto-renumber phases, but we simulate the worst case where
829+
// the phase number stayed identical.
830+
state.features[0].number = "2"; // simulate post-rename runtime state
831+
const reparsed = parsePlan(`# Plan
832+
833+
## Feature 1: Renamed
834+
835+
### Phase 1.1: x
836+
- [ ] **Implementation**: x
837+
- [ ] **Review**: y
838+
`);
839+
expect(arePhasesAligned(state, reparsed)).toBe(false);
840+
});
651841
});

build/orchestrator/cli.ts

Lines changed: 51 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ import {
4646
releaseLock,
4747
readLockInfo,
4848
lockPath,
49+
statePath,
4950
ensureLogDir,
5051
deriveStateSlug,
5152
logDir,
@@ -251,10 +252,12 @@ function saveState(
251252
}
252253

253254
/**
254-
* Emit greppable `[plan]` warnings for a `mergeReparsedPhases` report. Both
255-
* call sites (resume-time repair and the FEATURE_NEEDS_PHASES branch) share
256-
* this so log shapes stay identical and a single grep catches every merge
257-
* mutation — context tag distinguishes the call site.
255+
* Emit greppable `[plan]` warnings for a `mergeReparsedPhases` report. Today
256+
* the merge runs only on the FEATURE_NEEDS_PHASES path during a live build;
257+
* the resume guard is fail-closed (see resume guard handling above) so the
258+
* `"resume"` context tag is reserved for future use, not currently wired.
259+
* Context tag stays explicit so log shapes are uniform if a future code path
260+
* adds another merge call site.
258261
*/
259262
function logMergeReport(
260263
report: {
@@ -8581,30 +8584,51 @@ async function main() {
85818584
// kind to disk so state-only consumers (fault detectors,
85828585
// drain-faults) can read kind without re-parsing.
85838586
backfillKindFromPlan(state, phases);
8584-
// Detect state/plan phase-array disagreement and repair by
8585-
// re-aligning state.phases to the parser's view by `number`.
8586-
// Triggered by:
8587-
// 1. State files written by pre-fix gstack versions where
8588-
// FEATURE_NEEDS_PHASES mis-merged review phases (see
8589-
// `mergeReparsedPhases` in state.ts).
8590-
// 2. A user hand-editing the plan markdown between runs in a
8591-
// way that adds / renames phases.
8592-
// Repair preserves PhaseState identity for unchanged phase numbers
8593-
// (status, gemini output paths, codexReview records all survive).
8594-
// Orphaned state entries (numbers no longer in the plan) are
8595-
// dropped with a warning. The `arePhasesAligned` predicate checks
8596-
// both length AND per-index `number` agreement — a length match
8597-
// alone is insufficient (the original bug produced equal-length
8598-
// arrays with mis-aligned contents).
8599-
if (!arePhasesAligned(state, { phases })) {
8600-
console.warn(
8601-
`[plan] state.phases disagrees with the parsed plan ` +
8602-
`(state has ${state.phases.length} phase(s), plan has ${phases.length}). ` +
8603-
`Re-aligning state to the plan by phase number.`,
8587+
// Detect state/plan disagreement on resume and FAIL CLOSED.
8588+
//
8589+
// Auto-merging on resume was tempting but unsafe: runtime
8590+
// artifacts on shifted slots (gemini.outputFilePath,
8591+
// codexReview, committedAt) belong to the work that physically
8592+
// ran at that slot, not to the phase whose `number` is stored
8593+
// there in the corrupted JSON. By-number merge would re-attribute
8594+
// those artifacts to the wrong phase, and a downstream
8595+
// --mark-phase-committed would mark the wrong phase done — the
8596+
// real downstream phase would skip its real work. Same silent-
8597+
// corruption shape as the bug being fixed.
8598+
//
8599+
// The right move: stop, surface what's wrong, and let the user
8600+
// decide. They can delete state.json and start fresh, or
8601+
// hand-edit it with full context. Re-running with `--no-resume`
8602+
// also re-creates state from the current plan.
8603+
//
8604+
// Trigger: state.phases or state.features disagrees with the
8605+
// parsed plan on length or per-index `number` — either dimension
8606+
// proves desync. See `arePhasesAligned` in state.ts.
8607+
if (!arePhasesAligned(state, { phases, features })) {
8608+
console.error(
8609+
`\n✗ state/plan desync detected on resume.\n\n` +
8610+
` state.phases length: ${state.phases.length}\n` +
8611+
` parsed plan phases: ${phases.length}\n` +
8612+
` state.features length: ${state.features?.length ?? 0}\n` +
8613+
` parsed plan features: ${features.length}\n\n` +
8614+
`This usually means state.json was written by a pre-fix gstack version ` +
8615+
`(see release notes for the FEATURE_NEEDS_PHASES merge fix) or the plan ` +
8616+
`markdown was hand-edited between runs.\n\n` +
8617+
`gstack-build refuses to auto-merge because the on-disk state may ` +
8618+
`have runtime artifacts (gemini outputs, codex reviews, committedAt ` +
8619+
`timestamps) attached to phase numbers that no longer match the ` +
8620+
`parsed plan. Silently merging by number would re-attribute that ` +
8621+
`work to the wrong phase.\n\n` +
8622+
`To recover:\n` +
8623+
` 1. Re-run with --no-resume to rebuild state from the current plan ` +
8624+
`(loses runtime artifacts, restarts phases from scratch).\n` +
8625+
` 2. Or delete the state file and run again from scratch:\n` +
8626+
` rm ${statePath(slug)}\n` +
8627+
` 3. Or inspect state.json and the plan markdown side by side, ` +
8628+
`manually realign the phase numbers, then re-run.\n`,
86048629
);
8605-
const repair = mergeReparsedPhases(state, { phases, features });
8606-
logMergeReport(repair, "resume");
8607-
saveState(state, { noGbrain: args.noGbrain, log: console.warn });
8630+
exitCode = 2;
8631+
setupFailed = true;
86088632
}
86098633
if (
86108634
JSON.stringify(loaded.roleConfigs) !== JSON.stringify(args.roles)

build/orchestrator/state.ts

Lines changed: 35 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -271,24 +271,32 @@ export interface MergeReparsedPhasesReport {
271271
}
272272

273273
/**
274-
* Cheap predicate: does `state.phases` already agree with the parser's view?
274+
* Cheap predicate: does the on-disk state already agree with the parser's
275+
* view of both phases AND features?
275276
*
276-
* Used by the resume-time repair guard in cli.ts to decide whether to invoke
277-
* the full merge. Equal-length arrays with per-index `number` agreement is
278-
* sufficient because the merge always assigns `.index = i` and the parser
279-
* walks top-to-bottom — disagreement on either dimension proves desync.
277+
* Used by the resume-time fail-closed guard in cli.ts to decide whether to
278+
* abort with a remediation message. Disagreement on any dimension (phase
279+
* count, per-index phase number, feature count, per-index feature number)
280+
* proves desync — the orchestrator must not silently auto-heal because
281+
* runtime artifacts on shifted slots may belong to a different phase than
282+
* the stale `number` they're persisted under. See cli.ts resume guard for
283+
* the remediation flow.
280284
*
281-
* Kept here (next to `mergeReparsedPhases`) so the trigger condition and the
282-
* repair function stay in sync. Pure (no mutation), safe to call repeatedly.
285+
* Pure (no mutation), safe to call repeatedly.
283286
*/
284287
export function arePhasesAligned(
285288
state: BuildState,
286-
reparsed: { phases: Phase[] },
289+
reparsed: { phases: Phase[]; features: Feature[] },
287290
): boolean {
288291
if (state.phases.length !== reparsed.phases.length) return false;
289292
for (let i = 0; i < state.phases.length; i++) {
290293
if (state.phases[i].number !== reparsed.phases[i].number) return false;
291294
}
295+
if ((state.features?.length ?? 0) !== reparsed.features.length) return false;
296+
for (let i = 0; i < reparsed.features.length; i++) {
297+
const sf = state.features?.[i];
298+
if (!sf || sf.number !== reparsed.features[i].number) return false;
299+
}
292300
return true;
293301
}
294302

@@ -334,6 +342,25 @@ export function mergeReparsedPhases(
334342
state: BuildState,
335343
reparsed: { phases: Phase[]; features: Feature[] },
336344
): MergeReparsedPhasesReport {
345+
// Reject duplicate phase numbers in the parser's view: two `### Phase X`
346+
// headings with the same number would cause `stateByNumber.get(X)` to
347+
// return the same PhaseState object for two array slots, aliasing
348+
// `.index` mutation and downstream status writes. The parser today never
349+
// produces duplicates (every heading lands at a unique array index), but
350+
// a bug in `appendFeaturePhases` or a hand-edited plan could. Fail fast
351+
// here rather than silently corrupting state.
352+
const parserSeen = new Set<string>();
353+
for (const p of reparsed.phases) {
354+
if (parserSeen.has(p.number)) {
355+
throw new Error(
356+
`mergeReparsedPhases: parser produced duplicate phase number "${p.number}". ` +
357+
`This is a plan-file or parser bug — refusing to merge to avoid aliasing PhaseState ` +
358+
`between two array slots. Inspect the plan markdown for two "### Phase ${p.number}" headings.`,
359+
);
360+
}
361+
parserSeen.add(p.number);
362+
}
363+
337364
const stateByNumber = new Map<string, PhaseState>();
338365
for (const ps of state.phases) {
339366
stateByNumber.set(ps.number, ps);

0 commit comments

Comments
 (0)