fix(build/orchestrator): test-fixer surfaces hygiene reason instead of "exited N" (Bug M)#110
Merged
Conversation
…f "exited N" (Bug M) Closes Bug M — `RUN_GEMINI_FIX` failure handler at phase-runner.ts:825 was the lone holdout still calling `renderRoleStepFailureMessage` directly. Same legacy holdout that Bug I (PR #107) closed for `RUN_GEMINI_TEST_SPEC`; the test-fixer dispatcher was missed in that pass. Canonical incident: agnt2-prototype-prodl2-f3-f4-soak-and-backup Phase 3.1 (Extract tier resolution into public-rpc-proxy/tier) at 2026-05-27 03:34 UTC. Codex test-fixer (gpt-5.5) ran for 101 seconds, called `git status --short` multiple times, emitted "Files changed:" in its output, exited 0 with no commit. applyMutableAgentHygiene's requireNewCommit gate wrapped the success as exitCode=1 with hygieneFailure=true and body "test fixer did not create a new commit". The pre-fix attribution surfaced as `✗ Phase 3.1 ... failed: test-fixer exited 1` — the operator had to grep the hygiene log for the actual reason. Same shape Bug I closed for test-spec writer; same fix applies. Fix: swap `renderRoleStepFailureMessage("test-fixer", result)` → `geminiExitError("test-fixer", result)` at phase-runner.ts:825. The helper extracts the first non-header line of the hygiene body when `result.hygieneFailure === true` and falls through to the legacy `exited N; see <log>` shape when no hygiene marker is present. Post-fix, four distinct dispatcher families in phase-runner.ts use geminiExitError: primary-impl, test-spec writer (post-Bug-I), test-fixer (this PR), and dual-impl/judge. Pinned via T-M4 (`geminiExitError` call count ≥ 4) so a future regression that swaps any of them back to renderRoleStepFailureMessage shows up at review time. Tests (T-M1..T-M5 in build/orchestrator/__tests__/bug-m-test-fixer-hygiene-attribution.test.ts): - T-M1: hygieneFailure result → next.error names the hygiene reason ("did not create a new commit") and includes the agent log path for forensics. Legacy "test-fixer exited N" form MUST NOT appear. - T-M2: vanilla nonzero exit (no hygiene marker) → legacy "exited N" shape preserved. Important: when codex genuinely crashed with exit 2, "exited 2" is the right signal — only hygiene-converted failures need the new attribution. - T-M3: static-grep guard that RUN_GEMINI_FIX uses geminiExitError("test-fixer", result) literally. - T-M4: static-grep guard that phase-runner.ts has at least 4 geminiExitError callers (count pin so a future regression that swaps any of them back shows up at review time). - T-M5: negative guard — `renderRoleStepFailureMessage("test-fixer",` must NOT appear anywhere in phase-runner.ts. 5/5 pass. Architectural payoff: | Bug | Hygiene-aware attribution for | Resolved by | |---|---|---| | I | test-spec writer (RUN_GEMINI_TEST_SPEC) | PR #107 | | M | test-fixer (RUN_GEMINI_FIX) | THIS PR | After Bug M lands, all role-step dispatchers that route through applyMutableAgentHygiene also route through geminiExitError. The "agent did nothing" symptom now carries a one-line reason naming the actual gate it tripped, for every role. Note on related architectural issue: the test-fixer hygiene gate firing on no-commit-but-output-claimed-completion is a separate Codex quality issue (the agent reported "Files changed:" but didn't actually commit anything). The orchestrator fix here surfaces the right error attribution; the agent-quality fix (re-run tests after no-commit and detect "tests now green" as a success signal) is a deeper change deferred to a follow-up. Bug M only addresses the operator-facing attribution. Plan reference: ~/.claude/plans/fixing-plan-bugs-k-through-n-post-pr-108.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes Bug M —
RUN_GEMINI_FIXfailure handler at phase-runner.ts:825 was the lone holdout still callingrenderRoleStepFailureMessagedirectly. Same legacy holdout that Bug I (PR #107) closed forRUN_GEMINI_TEST_SPEC; the test-fixer dispatcher was missed in that pass.Plan: ~/.claude/plans/fixing-plan-bugs-k-through-n-post-pr-108.md
Canonical incident
agnt2-prototype-prodl2-f3-f4-soak-and-backup Phase 3.1 at 03:34 UTC. Codex test-fixer (gpt-5.5) ran for 101 seconds, exited 0 with no commit. Hygiene gate wrapped as exitCode=1 with body
"test fixer did not create a new commit". Operator saw✗ Phase 3.1 ... failed: test-fixer exited 1— had to grep the hygiene log for the actual reason.Fix
One-line swap:
renderRoleStepFailureMessage("test-fixer", result)→geminiExitError("test-fixer", result).geminiExitErrorextracts the first non-header line of the hygiene body whenresult.hygieneFailure === trueand falls through to the legacyexited N; see <log>shape when no hygiene marker is present.What changed
build/orchestrator/phase-runner.tsbuild/orchestrator/__tests__/bug-m-test-fixer-hygiene-attribution.test.tsArchitectural payoff
After Bug M lands, all role-step dispatchers that route through
applyMutableAgentHygienealso route throughgeminiExitError. T-M4 pins the helper-adoption count at ≥4 (primary-impl + test-spec writer + test-fixer + dual-impl/judge) so a future regression shows up at review time.Out of scope (separate concern)
The deeper architectural issue — Codex agents that report "Files changed:" but don't actually commit (a quality issue, not an orchestrator bug) — is deferred. A future fix would re-run tests after no-commit and detect "tests now green" as a success signal (the impl may already be correct). Bug M only addresses the operator-facing attribution.
Test plan
bun test build/orchestrator/__tests__/bug-m-test-fixer-hygiene-attribution.test.ts— 5/5 passbun testfull suite — greenrenderRoleStepFailureMessage("test-fixer",is gone from phase-runner.tsFork-versioning note
Per CLAUDE.md fork rule: fork-local skill work in
build/orchestrator/only. NoVERSION/package.json/CHANGELOG.mdbump.🤖 Generated with Claude Code