fix(build/orchestrator): RED_GATE_ZERO_TESTS_COLLECTED parser fallback + test-writer commit hygiene gate#99
Merged
anbangr merged 3 commits intoMay 26, 2026
Conversation
…known / wrapper-script runners Bug A in the same "Theme A/B" pair tracked in the archived halt investigations: a `/build` run that resolves the test command to runner "unknown" (wrapper scripts like `sh scripts/test.sh`, `make test`, generic `npm test`) — or to a misclassified runner (e.g. `bun run test` matched as "bun" but actually invoking vitest underneath) — could halt with RED_GATE_ZERO_TESTS_COLLECTED even though stdout contained a perfectly parseable vitest JSON line or a `Tests N passed` summary. Root cause: `extractTestCount`'s `default` branch returned a hardcoded zero without attempting any parser on stdout, and the named-runner branches trusted their single parser unconditionally. The fault message at phase-runner.ts then attributed the zero to "did you forget <!-- testCmd: -->?", which is the wrong remediation for this failure mode. Fix: introduce `tryFallbackParsers(output)` that walks every parser in priority order (vitest JSON → jest JSON → vitest stdout → jest stdout → pytest → bun → mocha → go) and returns the first non-zero recovery, tagged `source: "stdout-fallback-auto"` so logs and tests can distinguish it from a targeted parse. The fallback fires in two places: - `default` branch (unknown runner): try fallback, return its result if any, else preserve the zero. Always warn — the unknown runner is itself a useful diagnostic the user should see. - After a known-runner parse: when the primary returns `collected: 0` AND wasn't a strong JSON signal (vitest/jest/pytest/bun JSON's `numTotalTests: 0` is trusted), retry with the fallback scan. Warn only on actual recovery, to keep healthy runs quiet. Tests added in red-gate-runner-introspect.test.ts: - T8: unknown runner + vitest JSON → fallback recovers 14 tests, source=stdout-fallback-auto - T9: wrapper-script case (runner=bun + vitest JSON stdout) → fallback recovers - T10: unknown runner + vitest summary line (no JSON) → fallback recovers via stdout regex - T11: unknown runner + empty stdout → preserves zero (no spurious recovery) - T12: vitest JSON with numTotalTests=0 → trust JSON's zero, no fallback override Plus a regression test in red-gate-zero-tests-collected.test.ts pinning that a wrapper-script `testCmd: "sh scripts/test.sh"` with vitest JSON in stdout no longer halts with RED_GATE_ZERO_TESTS_COLLECTED. Pairs with the test-writer commit-hygiene gate in the next commit (Bug B); both bugs surface as the same fault class but have independent root causes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(test-writer drift)
Bug B in the same "Theme A/B" pair: the test-writer phase (RUN_GEMINI_TEST_SPEC)
accepted exit code 0 with zero new commits as "test_spec_done", letting a drifting
sub-agent (e.g. a fast/eager model that returns an implementation summary instead
of executing the buried "commit the failing tests" step in the input prompt) silently
advance the state machine. The missing commit only surfaced one phase later at
VERIFY_RED as the misleading RED_GATE_ZERO_TESTS_COLLECTED fault — wrong attribution,
deferred halt, hard to diagnose.
The pattern was already established for the primary implementor at the RUN_GEMINI
dispatch site: capture `before = captureGitSnapshot(cwd)` + sibling baselines +
parent-workspace snapshot, then wrap the role result in
`applyMutableAgentHygiene({ requireNewCommit: true, ... })`. RUN_GEMINI_TEST_SPEC
was the one role with a "must commit" contract but no hygiene gate.
Fix: mirror the primary-impl hygiene pattern onto RUN_GEMINI_TEST_SPEC:
- Hoist `outputFilePath` outside the dry-run branch so the post-role hygiene
check can verify `requireNonEmptyOutput`.
- Capture git snapshot, sibling baselines, parent-workspace snapshot before
`runRoleTask` in both dry-run and real-run branches (dry-run gets a no-op
parent snapshot refresh).
- After `runRoleTask`, wrap result in `applyMutableAgentHygiene` with
`requireNewCommit: true` (unconditional — test-writer only runs for
kind="code" phases per decideNextAction; audit-only doesn't apply because
the role's entire job is to produce committed failing tests).
When the gate fires, `hygieneFailureResult` returns `exitCode: 1`, which the
existing failure branch in phase-runner.ts:691 catches and turns into
status="failed" with a clear hygiene error message — attribution at the right
phase, with the right cause.
No prompt or provider changes. The gate is provider-agnostic, so future
provider swaps (the recent kimi → gemini-3.5-flash → codex backup chain
introduced in commit 8f60429) won't reintroduce this drift class.
Tests added in red-gate-zero-tests-collected.test.ts:
- Behavioral: `applyResult` for RUN_GEMINI_TEST_SPEC with an exit=1 (hygiene-
failed) SubAgentResult sets status="failed", not "test_spec_done".
- Structural: cli.ts RUN_GEMINI_TEST_SPEC dispatch block contains
`applyMutableAgentHygiene`, `label: "test-writer"`, `requireNewCommit: true` —
pins the wiring so a refactor can't silently drop the gate.
Pairs with the extractTestCount fallback in the previous commit. Both fixes
target the same fault-class symptom (RED_GATE_ZERO_TESTS_COLLECTED) but address
independent root causes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-zero-tests-and-test-writer-hygiene # Conflicts: # build/orchestrator/cli.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the "Theme A / Theme B" fault pair from the archived halt investigations. Both bugs surface as the same `RED_GATE_ZERO_TESTS_COLLECTED` symptom but have independent root causes — together they explain the build halt at Phase 1.2 from the user's most recent /build run (vitest reported 14 passing, orchestrator halted with "unknown runner").
Bug A — `extractTestCount` discards stdout for unknown / misclassified runners.
Bug B — `RUN_GEMINI_TEST_SPEC` has no commit hygiene gate.
Bonus — merge resolution completes upstream PR #98's intent. Upstream's subtree-aware `testCmd` inference explicitly assumed the hygiene gate this PR adds ("the test-writer hygiene gate requires a commit per iteration") but didn't implement it. Now the inference's invariant actually holds. The merge also adds a small safety improvement: the subtree inference is gated on `phaseState.status === "test_spec_done"` so a hygiene-failed phase doesn't read `HEAD~1..HEAD` against an unrelated prior commit.
Commits:
Test Coverage
8 new tests added across 2 suites, all passing alongside upstream's 213-line `resolve-test-cmd-inference` suite:
`build/orchestrator/tests/red-gate-runner-introspect.test.ts` (5 new):
`build/orchestrator/tests/red-gate-zero-tests-collected.test.ts` (3 new):
Targeted suites (4 files): 71/71 pass. Full `bun test` suite: exit=0, zero failures.
Pre-Landing Review
No SQL, no LLM trust boundary violations, no user input handling. Changes are contained to TypeScript orchestrator code following the established `primaryImpl` hygiene pattern. The merge resolution preserves both my hygiene gate and upstream's subtree inference; the safety gate on `phaseState.status === "test_spec_done"` prevents the subtree inference from misattributing tests when the hygiene gate fails.
Test plan
🤖 Generated with Claude Code