Skip to content

fix(build/orchestrator): RED_GATE_ZERO_TESTS_COLLECTED parser fallback + test-writer commit hygiene gate#99

Merged
anbangr merged 3 commits into
mainfrom
worktree-fix-red-gate-zero-tests-and-test-writer-hygiene
May 26, 2026
Merged

fix(build/orchestrator): RED_GATE_ZERO_TESTS_COLLECTED parser fallback + test-writer commit hygiene gate#99
anbangr merged 3 commits into
mainfrom
worktree-fix-red-gate-zero-tests-and-test-writer-hygiene

Conversation

@anbangr

@anbangr anbangr commented May 26, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes the "Theme A / Theme B" fault pair from the archived halt investigations. Both bugs surface as the same `RED_GATE_ZERO_TESTS_COLLECTED` symptom but have independent root causes — together they explain the build halt at Phase 1.2 from the user's most recent /build run (vitest reported 14 passing, orchestrator halted with "unknown runner").

Bug A — `extractTestCount` discards stdout for unknown / misclassified runners.

  • `detectRunnerFromTestCmd` returns `"unknown"` whenever the resolved `testCmd` is a wrapper script (`sh scripts/test.sh`, `make test`, generic `npm test`) that doesn't contain a recognized runner keyword. The `default` branch in `extractTestCount` then returned a hardcoded zero without attempting any parser on stdout — so even when stdout had a perfectly parseable vitest JSON line or `Tests N passed` summary, the orchestrator treated it as zero.
  • Same failure mode when the keyword detector misclassifies: `bun run test` resolves to runner `"bun"`, but if the underlying tool is vitest, the bun-stdout parser sees no `Ran N tests` banner and returns zero. Previously trusted unconditionally.
  • Fix: `tryFallbackParsers(output)` walks every parser in priority order (vitest JSON → jest JSON → vitest stdout → jest stdout → pytest → bun → mocha → go) and returns the first non-zero recovery, tagged `source: "stdout-fallback-auto"`. Fires in two places: the `default` branch (unknown runner), and after a named-runner parse that returned zero from stdout (not from a strong JSON signal — JSON's `numTotalTests: 0` is still trusted as authoritative).

Bug B — `RUN_GEMINI_TEST_SPEC` has no commit hygiene gate.

  • The test-writer phase only checked `exitCode` and let a sub-agent that returned exit 0 with a summary but zero new commits silently advance to `test_spec_done`. The missing commit surfaced one phase later at `VERIFY_RED` as the misleading `RED_GATE_ZERO_TESTS_COLLECTED` fault — wrong attribution, deferred halt, hard to diagnose. The pattern was already established for `RUN_GEMINI` (primary-impl) via `applyMutableAgentHygiene({ requireNewCommit: true, ... })`. Test-writer was the one role with a "must commit" contract but no gate.
  • Fix: mirror the primary-impl hygiene capture (git snapshot + sibling baselines + parent workspace before `runRoleTask`) and wrap the result in `applyMutableAgentHygiene` with `requireNewCommit: true` (unconditional — test-writer only dispatches for `kind === "code"` phases; audit-only doesn't apply). `hygieneFailureResult` returns `exitCode: 1`, which the existing failure branch in `phase-runner.ts` catches and turns into `status: "failed"` with a clear hygiene error — attribution at the right phase, with the right cause.

Bonus — merge resolution completes upstream PR #98's intent. Upstream's subtree-aware `testCmd` inference explicitly assumed the hygiene gate this PR adds ("the test-writer hygiene gate requires a commit per iteration") but didn't implement it. Now the inference's invariant actually holds. The merge also adds a small safety improvement: the subtree inference is gated on `phaseState.status === "test_spec_done"` so a hygiene-failed phase doesn't read `HEAD~1..HEAD` against an unrelated prior commit.

Commits:

Test Coverage

8 new tests added across 2 suites, all passing alongside upstream's 213-line `resolve-test-cmd-inference` suite:

`build/orchestrator/tests/red-gate-runner-introspect.test.ts` (5 new):

  • T8 unknown runner + vitest JSON in stdout → fallback recovers 14, source=`stdout-fallback-auto`
  • T9 wrapper-script case: runner=`bun` + vitest JSON stdout → fallback recovers
  • T10 unknown runner + vitest summary line (no JSON) → fallback recovers via stdout regex
  • T11 unknown runner + empty stdout → preserves zero (no spurious recovery)
  • T12 vitest JSON with `numTotalTests: 0` → trusts the zero, no fallback override

`build/orchestrator/tests/red-gate-zero-tests-collected.test.ts` (3 new):

  • Bug A regression wrapper-script `testCmd: "sh scripts/test.sh"` + vitest JSON stdout → no spurious RED_GATE halt
  • Bug B behavioral `applyResult` for `RUN_GEMINI_TEST_SPEC` with exit=1 (hygiene-failed) result → `status: "failed"`, not `test_spec_done`
  • Bug B structural cli.ts `RUN_GEMINI_TEST_SPEC` dispatch block contains `applyMutableAgentHygiene` + `label: "test-writer"` + `requireNewCommit: true` (pins the wiring against future refactors)

Targeted suites (4 files): 71/71 pass. Full `bun test` suite: exit=0, zero failures.

Pre-Landing Review

No SQL, no LLM trust boundary violations, no user input handling. Changes are contained to TypeScript orchestrator code following the established `primaryImpl` hygiene pattern. The merge resolution preserves both my hygiene gate and upstream's subtree inference; the safety gate on `phaseState.status === "test_spec_done"` prevents the subtree inference from misattributing tests when the hygiene gate fails.

Test plan

🤖 Generated with Claude Code

anbangr and others added 3 commits May 26, 2026 08:30
…known / wrapper-script runners

Bug A in the same "Theme A/B" pair tracked in the archived halt investigations: a `/build`
run that resolves the test command to runner "unknown" (wrapper scripts like
`sh scripts/test.sh`, `make test`, generic `npm test`) — or to a misclassified runner
(e.g. `bun run test` matched as "bun" but actually invoking vitest underneath) — could
halt with RED_GATE_ZERO_TESTS_COLLECTED even though stdout contained a perfectly
parseable vitest JSON line or a `Tests N passed` summary.

Root cause: `extractTestCount`'s `default` branch returned a hardcoded zero without
attempting any parser on stdout, and the named-runner branches trusted their single
parser unconditionally. The fault message at phase-runner.ts then attributed the
zero to "did you forget <!-- testCmd: -->?", which is the wrong remediation for
this failure mode.

Fix: introduce `tryFallbackParsers(output)` that walks every parser in priority
order (vitest JSON → jest JSON → vitest stdout → jest stdout → pytest → bun → mocha
→ go) and returns the first non-zero recovery, tagged `source: "stdout-fallback-auto"`
so logs and tests can distinguish it from a targeted parse. The fallback fires in
two places:

- `default` branch (unknown runner): try fallback, return its result if any,
  else preserve the zero. Always warn — the unknown runner is itself a useful
  diagnostic the user should see.
- After a known-runner parse: when the primary returns `collected: 0` AND wasn't
  a strong JSON signal (vitest/jest/pytest/bun JSON's `numTotalTests: 0` is
  trusted), retry with the fallback scan. Warn only on actual recovery, to keep
  healthy runs quiet.

Tests added in red-gate-runner-introspect.test.ts:
- T8: unknown runner + vitest JSON → fallback recovers 14 tests, source=stdout-fallback-auto
- T9: wrapper-script case (runner=bun + vitest JSON stdout) → fallback recovers
- T10: unknown runner + vitest summary line (no JSON) → fallback recovers via stdout regex
- T11: unknown runner + empty stdout → preserves zero (no spurious recovery)
- T12: vitest JSON with numTotalTests=0 → trust JSON's zero, no fallback override

Plus a regression test in red-gate-zero-tests-collected.test.ts pinning that a
wrapper-script `testCmd: "sh scripts/test.sh"` with vitest JSON in stdout no
longer halts with RED_GATE_ZERO_TESTS_COLLECTED.

Pairs with the test-writer commit-hygiene gate in the next commit (Bug B); both
bugs surface as the same fault class but have independent root causes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(test-writer drift)

Bug B in the same "Theme A/B" pair: the test-writer phase (RUN_GEMINI_TEST_SPEC)
accepted exit code 0 with zero new commits as "test_spec_done", letting a drifting
sub-agent (e.g. a fast/eager model that returns an implementation summary instead
of executing the buried "commit the failing tests" step in the input prompt) silently
advance the state machine. The missing commit only surfaced one phase later at
VERIFY_RED as the misleading RED_GATE_ZERO_TESTS_COLLECTED fault — wrong attribution,
deferred halt, hard to diagnose.

The pattern was already established for the primary implementor at the RUN_GEMINI
dispatch site: capture `before = captureGitSnapshot(cwd)` + sibling baselines +
parent-workspace snapshot, then wrap the role result in
`applyMutableAgentHygiene({ requireNewCommit: true, ... })`. RUN_GEMINI_TEST_SPEC
was the one role with a "must commit" contract but no hygiene gate.

Fix: mirror the primary-impl hygiene pattern onto RUN_GEMINI_TEST_SPEC:
- Hoist `outputFilePath` outside the dry-run branch so the post-role hygiene
  check can verify `requireNonEmptyOutput`.
- Capture git snapshot, sibling baselines, parent-workspace snapshot before
  `runRoleTask` in both dry-run and real-run branches (dry-run gets a no-op
  parent snapshot refresh).
- After `runRoleTask`, wrap result in `applyMutableAgentHygiene` with
  `requireNewCommit: true` (unconditional — test-writer only runs for
  kind="code" phases per decideNextAction; audit-only doesn't apply because
  the role's entire job is to produce committed failing tests).

When the gate fires, `hygieneFailureResult` returns `exitCode: 1`, which the
existing failure branch in phase-runner.ts:691 catches and turns into
status="failed" with a clear hygiene error message — attribution at the right
phase, with the right cause.

No prompt or provider changes. The gate is provider-agnostic, so future
provider swaps (the recent kimi → gemini-3.5-flash → codex backup chain
introduced in commit 8f60429) won't reintroduce this drift class.

Tests added in red-gate-zero-tests-collected.test.ts:
- Behavioral: `applyResult` for RUN_GEMINI_TEST_SPEC with an exit=1 (hygiene-
  failed) SubAgentResult sets status="failed", not "test_spec_done".
- Structural: cli.ts RUN_GEMINI_TEST_SPEC dispatch block contains
  `applyMutableAgentHygiene`, `label: "test-writer"`, `requireNewCommit: true` —
  pins the wiring so a refactor can't silently drop the gate.

Pairs with the extractTestCount fallback in the previous commit. Both fixes
target the same fault-class symptom (RED_GATE_ZERO_TESTS_COLLECTED) but address
independent root causes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-zero-tests-and-test-writer-hygiene

# Conflicts:
#	build/orchestrator/cli.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant