fix(build/orchestrator): RED_GATE_ZERO_TESTS_COLLECTED parser fallback + test-writer commit hygiene gate by anbangr · Pull Request #99 · anbangr/gstack

anbangr · 2026-05-26T00:47:10Z

Summary

Fixes the "Theme A / Theme B" fault pair from the archived halt investigations. Both bugs surface as the same `RED_GATE_ZERO_TESTS_COLLECTED` symptom but have independent root causes — together they explain the build halt at Phase 1.2 from the user's most recent /build run (vitest reported 14 passing, orchestrator halted with "unknown runner").

Bug A — `extractTestCount` discards stdout for unknown / misclassified runners.

`detectRunnerFromTestCmd` returns `"unknown"` whenever the resolved `testCmd` is a wrapper script (`sh scripts/test.sh`, `make test`, generic `npm test`) that doesn't contain a recognized runner keyword. The `default` branch in `extractTestCount` then returned a hardcoded zero without attempting any parser on stdout — so even when stdout had a perfectly parseable vitest JSON line or `Tests N passed` summary, the orchestrator treated it as zero.
Same failure mode when the keyword detector misclassifies: `bun run test` resolves to runner `"bun"`, but if the underlying tool is vitest, the bun-stdout parser sees no `Ran N tests` banner and returns zero. Previously trusted unconditionally.
Fix: `tryFallbackParsers(output)` walks every parser in priority order (vitest JSON → jest JSON → vitest stdout → jest stdout → pytest → bun → mocha → go) and returns the first non-zero recovery, tagged `source: "stdout-fallback-auto"`. Fires in two places: the `default` branch (unknown runner), and after a named-runner parse that returned zero from stdout (not from a strong JSON signal — JSON's `numTotalTests: 0` is still trusted as authoritative).

Bug B — `RUN_GEMINI_TEST_SPEC` has no commit hygiene gate.

The test-writer phase only checked `exitCode` and let a sub-agent that returned exit 0 with a summary but zero new commits silently advance to `test_spec_done`. The missing commit surfaced one phase later at `VERIFY_RED` as the misleading `RED_GATE_ZERO_TESTS_COLLECTED` fault — wrong attribution, deferred halt, hard to diagnose. The pattern was already established for `RUN_GEMINI` (primary-impl) via `applyMutableAgentHygiene({ requireNewCommit: true, ... })`. Test-writer was the one role with a "must commit" contract but no gate.
Fix: mirror the primary-impl hygiene capture (git snapshot + sibling baselines + parent workspace before `runRoleTask`) and wrap the result in `applyMutableAgentHygiene` with `requireNewCommit: true` (unconditional — test-writer only dispatches for `kind === "code"` phases; audit-only doesn't apply). `hygieneFailureResult` returns `exitCode: 1`, which the existing failure branch in `phase-runner.ts` catches and turns into `status: "failed"` with a clear hygiene error — attribution at the right phase, with the right cause.

Bonus — merge resolution completes upstream PR #98's intent. Upstream's subtree-aware `testCmd` inference explicitly assumed the hygiene gate this PR adds ("the test-writer hygiene gate requires a commit per iteration") but didn't implement it. Now the inference's invariant actually holds. The merge also adds a small safety improvement: the subtree inference is gated on `phaseState.status === "test_spec_done"` so a hygiene-failed phase doesn't read `HEAD~1..HEAD` against an unrelated prior commit.

Commits:

`ab859268` — Fix A: best-effort parser fallback
`4ce02148` — Fix B: test-writer commit hygiene gate
`12bd20ca` — Merge `origin/main` (PR fix(build/orchestrator): fault classifier + subtree testCmd + auth preflight (3 bugs from 7-report analysis) #98's subtree-testCmd inference)

Test Coverage

8 new tests added across 2 suites, all passing alongside upstream's 213-line `resolve-test-cmd-inference` suite:

`build/orchestrator/tests/red-gate-runner-introspect.test.ts` (5 new):

T8 unknown runner + vitest JSON in stdout → fallback recovers 14, source=`stdout-fallback-auto`
T9 wrapper-script case: runner=`bun` + vitest JSON stdout → fallback recovers
T10 unknown runner + vitest summary line (no JSON) → fallback recovers via stdout regex
T11 unknown runner + empty stdout → preserves zero (no spurious recovery)
T12 vitest JSON with `numTotalTests: 0` → trusts the zero, no fallback override

`build/orchestrator/tests/red-gate-zero-tests-collected.test.ts` (3 new):

Bug A regression wrapper-script `testCmd: "sh scripts/test.sh"` + vitest JSON stdout → no spurious RED_GATE halt
Bug B behavioral `applyResult` for `RUN_GEMINI_TEST_SPEC` with exit=1 (hygiene-failed) result → `status: "failed"`, not `test_spec_done`
Bug B structural cli.ts `RUN_GEMINI_TEST_SPEC` dispatch block contains `applyMutableAgentHygiene` + `label: "test-writer"` + `requireNewCommit: true` (pins the wiring against future refactors)

Targeted suites (4 files): 71/71 pass. Full `bun test` suite: exit=0, zero failures.

Pre-Landing Review

No SQL, no LLM trust boundary violations, no user input handling. Changes are contained to TypeScript orchestrator code following the established `primaryImpl` hygiene pattern. The merge resolution preserves both my hygiene gate and upstream's subtree inference; the safety gate on `phaseState.status === "test_spec_done"` prevents the subtree inference from misattributing tests when the hygiene gate fails.

Test plan

`bun test build/orchestrator/tests/red-gate-runner-introspect.test.ts build/orchestrator/tests/red-gate-zero-tests-collected.test.ts build/orchestrator/tests/hygiene-sibling-repos.test.ts build/orchestrator/tests/resolve-test-cmd-inference.test.ts` — 71/71 pass
`bun test` (full free suite) — exit=0, no regressions
Merge conflict in cli.ts resolved (PR fix(build/orchestrator): fault classifier + subtree testCmd + auth preflight (3 bugs from 7-report analysis) #98 added subtree-aware testCmd inference that depends on this PR's hygiene gate)

🤖 Generated with Claude Code

…known / wrapper-script runners Bug A in the same "Theme A/B" pair tracked in the archived halt investigations: a `/build` run that resolves the test command to runner "unknown" (wrapper scripts like `sh scripts/test.sh`, `make test`, generic `npm test`) — or to a misclassified runner (e.g. `bun run test` matched as "bun" but actually invoking vitest underneath) — could halt with RED_GATE_ZERO_TESTS_COLLECTED even though stdout contained a perfectly parseable vitest JSON line or a `Tests N passed` summary. Root cause: `extractTestCount`'s `default` branch returned a hardcoded zero without attempting any parser on stdout, and the named-runner branches trusted their single parser unconditionally. The fault message at phase-runner.ts then attributed the zero to "did you forget ?", which is the wrong remediation for this failure mode. Fix: introduce `tryFallbackParsers(output)` that walks every parser in priority order (vitest JSON → jest JSON → vitest stdout → jest stdout → pytest → bun → mocha → go) and returns the first non-zero recovery, tagged `source: "stdout-fallback-auto"` so logs and tests can distinguish it from a targeted parse. The fallback fires in two places: - `default` branch (unknown runner): try fallback, return its result if any, else preserve the zero. Always warn — the unknown runner is itself a useful diagnostic the user should see. - After a known-runner parse: when the primary returns `collected: 0` AND wasn't a strong JSON signal (vitest/jest/pytest/bun JSON's `numTotalTests: 0` is trusted), retry with the fallback scan. Warn only on actual recovery, to keep healthy runs quiet. Tests added in red-gate-runner-introspect.test.ts: - T8: unknown runner + vitest JSON → fallback recovers 14 tests, source=stdout-fallback-auto - T9: wrapper-script case (runner=bun + vitest JSON stdout) → fallback recovers - T10: unknown runner + vitest summary line (no JSON) → fallback recovers via stdout regex - T11: unknown runner + empty stdout → preserves zero (no spurious recovery) - T12: vitest JSON with numTotalTests=0 → trust JSON's zero, no fallback override Plus a regression test in red-gate-zero-tests-collected.test.ts pinning that a wrapper-script `testCmd: "sh scripts/test.sh"` with vitest JSON in stdout no longer halts with RED_GATE_ZERO_TESTS_COLLECTED. Pairs with the test-writer commit-hygiene gate in the next commit (Bug B); both bugs surface as the same fault class but have independent root causes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…(test-writer drift) Bug B in the same "Theme A/B" pair: the test-writer phase (RUN_GEMINI_TEST_SPEC) accepted exit code 0 with zero new commits as "test_spec_done", letting a drifting sub-agent (e.g. a fast/eager model that returns an implementation summary instead of executing the buried "commit the failing tests" step in the input prompt) silently advance the state machine. The missing commit only surfaced one phase later at VERIFY_RED as the misleading RED_GATE_ZERO_TESTS_COLLECTED fault — wrong attribution, deferred halt, hard to diagnose. The pattern was already established for the primary implementor at the RUN_GEMINI dispatch site: capture `before = captureGitSnapshot(cwd)` + sibling baselines + parent-workspace snapshot, then wrap the role result in `applyMutableAgentHygiene({ requireNewCommit: true, ... })`. RUN_GEMINI_TEST_SPEC was the one role with a "must commit" contract but no hygiene gate. Fix: mirror the primary-impl hygiene pattern onto RUN_GEMINI_TEST_SPEC: - Hoist `outputFilePath` outside the dry-run branch so the post-role hygiene check can verify `requireNonEmptyOutput`. - Capture git snapshot, sibling baselines, parent-workspace snapshot before `runRoleTask` in both dry-run and real-run branches (dry-run gets a no-op parent snapshot refresh). - After `runRoleTask`, wrap result in `applyMutableAgentHygiene` with `requireNewCommit: true` (unconditional — test-writer only runs for kind="code" phases per decideNextAction; audit-only doesn't apply because the role's entire job is to produce committed failing tests). When the gate fires, `hygieneFailureResult` returns `exitCode: 1`, which the existing failure branch in phase-runner.ts:691 catches and turns into status="failed" with a clear hygiene error message — attribution at the right phase, with the right cause. No prompt or provider changes. The gate is provider-agnostic, so future provider swaps (the recent kimi → gemini-3.5-flash → codex backup chain introduced in commit 8f60429) won't reintroduce this drift class. Tests added in red-gate-zero-tests-collected.test.ts: - Behavioral: `applyResult` for RUN_GEMINI_TEST_SPEC with an exit=1 (hygiene- failed) SubAgentResult sets status="failed", not "test_spec_done". - Structural: cli.ts RUN_GEMINI_TEST_SPEC dispatch block contains `applyMutableAgentHygiene`, `label: "test-writer"`, `requireNewCommit: true` — pins the wiring so a refactor can't silently drop the gate. Pairs with the extractTestCount fallback in the previous commit. Both fixes target the same fault-class symptom (RED_GATE_ZERO_TESTS_COLLECTED) but address independent root causes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-zero-tests-and-test-writer-hygiene # Conflicts: # build/orchestrator/cli.ts

anbangr and others added 3 commits May 26, 2026 08:30

Merge remote-tracking branch 'origin/main' into worktree-fix-red-gate…

12bd20c

…-zero-tests-and-test-writer-hygiene # Conflicts: # build/orchestrator/cli.ts

anbangr merged commit 4b62fb4 into main May 26, 2026

anbangr deleted the worktree-fix-red-gate-zero-tests-and-test-writer-hygiene branch May 26, 2026 01:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(build/orchestrator): RED_GATE_ZERO_TESTS_COLLECTED parser fallback + test-writer commit hygiene gate#99

fix(build/orchestrator): RED_GATE_ZERO_TESTS_COLLECTED parser fallback + test-writer commit hygiene gate#99
anbangr merged 3 commits into
mainfrom
worktree-fix-red-gate-zero-tests-and-test-writer-hygiene

anbangr commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anbangr commented May 26, 2026

Summary

Test Coverage

Pre-Landing Review

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant