Skip to content

fix(build/orchestrator): codex test-writer wrapper prompt over-reaches into production code (Bug H)#106

Merged
anbangr merged 2 commits into
mainfrom
worktree-fix-codex-test-writer-role-confusion-prompt
May 26, 2026
Merged

fix(build/orchestrator): codex test-writer wrapper prompt over-reaches into production code (Bug H)#106
anbangr merged 2 commits into
mainfrom
worktree-fix-codex-test-writer-role-confusion-prompt

Conversation

@anbangr

@anbangr anbangr commented May 26, 2026

Copy link
Copy Markdown
Owner

Summary

Closes Bug H — the fourth bug class behind the canonical PR #102 "agent's output summary listed N non-test file(s)" role-boundary refusal. Together with PRs #102/#103/#104/#105 shipped earlier today, this PR closes the upstream cause: the wrapper prompt sub-agents/buildCodexImplArgv emits for EVERY codex role hardcoded implementor language ("Implement the changes autonomously... Do NOT change test assertions — only make tests pass"). When that prompt reached a test-writer role, codex obediently wrote the production code AND the tests, then PR #102's downstream boundary caught and refused the output.

Canonical incident: polis-mesh F5 build (2026-05-26T17:23 CST) — 30 minutes after configure.cm flipped testWriter from gemini/gemini-2.5-flash (passively role-restrained by being weaker) to codex/gpt-5.5 (eager generalist), the very next test-writer phase produced ?? experiments/harness/sla/satisfaction.py, ?? experiments/harness/llm_proxy/polis_prompt.py, and the matching test file. Codex even ran the tests (11/11 passing, 94% cov). PR #102 refused. Phase halted dirty. Source fault: mitosis-oasis-phase1-part2-20260526-083551-a484f0aa-PHASE_FAILED:p2:bbe92751.json.

Architectural payoff for the "no commit / dirty tree" symptom space

After this PR lands, the <role> did not create a new commit hygiene-gate text becomes single-meaning across four historical hidden causes:

Bug Symptom Resolved by
A FEATURE_REDO loop: impl gets no findings, makes no commit PR #103
E Codex sandbox EPERM on .git/worktrees/<run>/index.lock PR #104
F Prose summary lacks parseable paths, recovery refuses PR #105
H Codex test-writer over-reaches into production code THIS PR

The only remaining "no commit" path after #H is the agent genuinely doing nothing.

What changed

Initial fix (commit 33821323)

  1. buildCodexImplArgv accepts a new optional roleId param and switches the prompt block based on it:
    • "test-writer": "You are the TEST WRITER", "Touch ONLY test files", "Do NOT write production code"
    • everything else (including undefined): the legacy implementor prompt, unchanged for back-compat with primary-impl, test-fixer, dual-impl tournament, and every existing call site
  2. runCodexImpl accepts and forwards roleId into the spread-passed buildCodexImplArgv opts
  3. runRoleTask (cli.ts) accepts roleId and threads it through to the codex branch
  4. The test-writer dispatch at RUN_GEMINI_TEST_SPEC passes roleId: "test-writer"

Inline hardening from /review pipeline (commit e27c0416)

Four reviewers — testing specialist + red-team specialist + Codex adversarial (P1, 9/10) + Codex structured review (P2) — all independently flagged the SAME contradiction: the prompt promised *_test.go but classifyTestWriterCommit only matches dir-prefixed paths and .test.* / .spec.* suffixes. A Go test-writer would follow the prompt, commit pkg/foo_test.go, then get refused — re-creating the exact Bug H symptom with a different cause.

Multi-source confirmation across 4 independent reviewers = HIGH finding. Fixed inline:

  1. Prompt rewritten to match the enforcer surface exactly. Drops *_test.* (Go). 10-line comment block at the prompt site names the cli.ts enforcer literal and warns future maintainers to keep prompt NARROWER than enforcer (never wider). Also clarifies "ADD new failing tests; leave existing ones untouched" so codex can't misread "Do NOT make existing tests pass" as "make existing tests fail" (separate red-team MEDIUM finding).

  2. Reuses cli.ts's canonical RoleHygieneId strict union (11 role IDs) instead of the inline ad-hoc union (3 + | string escape). Maintainability finding from the testing specialist. import type keeps the cli ↔ sub-agents reference cycle compile-time-only.

  3. Added roleId?: RoleHygieneId to RunConfiguredRoleTaskOpts as defensive future-proofing for a hypothetical future configure.cm test-writer role.

  4. Five new tests (T-H5d + T-H6a/b/c/d in a second describe block):

    • T-H5d: counts roleId: "test-writer" occurrences in cli.ts; asserts exactly 2 (dispatch + hygiene)
    • T-H6a: prompt enumerates the 5 enforcer-accepted dir prefixes
    • T-H6b: prompt does NOT mention *_test.go (the pattern that triggered this hardening)
    • T-H6c: prompt clarifies "ADD new failing tests" vs "do not modify existing passing tests"
    • T-H6d: extracts enforcer's TEST_PATH_RE dir-alternation from cli.ts source; asserts prompt mentions every dir the enforcer accepts. A divergence in either direction fails CI

Defense-in-depth preserved

PR #102's role boundary still fires on any agent that bypasses the prompt instruction (adversarial or otherwise). The new prompt is the FIRST line of defense; PR #102 stays as the backstop. The fix moves the refusal upstream — codex now declines at the prompt level rather than the post-spawn refusal level — so the worktree stays clean and operators don't need --mark-phase-committed to recover.

Test plan

  • bun test build/orchestrator/__tests__/codex-role-prompt.test.ts — 13/13 pass
  • bun test (full free suite) — green
  • Targeted regression suites (codex-sandbox-recovery, test-writer-drift, hygiene-sibling-repos, recovery-prose-summary-fallback) — no regression
  • Multi-reviewer review pipeline: 4 reviewers + 2 codex passes, all multi-confirmed findings addressed inline

Deferred (LOW findings, plan follow-up)

  • T-H4 comment strengthening (explicit refactor-guard signal)
  • T-H5c regex window tightening (current 1200-char window is permissive)
  • Dual-impl invariant test (theoretical future risk if test-spec tournament ever ships)
  • Enforcer widening to support *_test.go basename — separate concern, requires updating classifyTestWriterCommit at cli.ts:2138 + PR fix(build/orchestrator): test-writer role-drift structural fix (legs 1+2+3 + strict mixed-commit check) #102 contract, plus matching prompt update. Defer until a Go-heavy project actually hits the gap.

Fork-versioning note

Per CLAUDE.md fork rule: fork-local skill work in build/orchestrator/ only. No VERSION / package.json / top-level CHANGELOG.md bump.

Plan + source links

🤖 Generated with Claude Code

anbangr and others added 2 commits May 26, 2026 17:39
…s into production code (Bug H)

Closes Bug H — the third hidden bug class behind the canonical PR #102
"agent's output summary listed N non-test file(s)" role-boundary refusal.
PRs #102/#103/#104/#105 stacked the safety nets; this PR fixes the
upstream cause: the wrapper prompt sub-agents/buildCodexImplArgv emits
for EVERY codex role hardcodes implementor language ("Implement the
changes autonomously... Do NOT change test assertions — only make tests
pass"). When that prompt reaches a test-writer role, codex obediently
writes the production code AND the tests, then PR #102's downstream
boundary catches and refuses the output.

The polis-mesh F5 build (2026-05-26T17:23 CST) was the canonical
incident: 30 minutes after `configure.cm` flipped testWriter from
gemini/gemini-2.5-flash (passively role-restrained by being weaker) to
codex/gpt-5.5 (eager generalist), the very next test-writer phase
produced `?? experiments/harness/sla/satisfaction.py`,
`?? experiments/harness/llm_proxy/polis_prompt.py`, and the matching
test file. Codex even ran the tests (11/11 passing, 94% cov). PR #102
refused. Phase halted dirty.

Fix shape:
1. `buildCodexImplArgv` (sub-agents.ts) accepts a new optional
   `roleId?: "test-writer" | "test-fixer" | "primary-impl" | string`
   parameter and switches the prompt block based on it:
   - "test-writer": "You are the TEST WRITER", "Touch ONLY test files",
     "Do NOT write production code", "Do NOT make existing tests pass"
   - everything else (including undefined): the legacy implementor
     prompt, unchanged for back-compat with primary-impl, test-fixer,
     dual-impl tournament, and every existing call site
2. `runCodexImpl` (sub-agents.ts) accepts and forwards `roleId` into
   the spread-passed buildCodexImplArgv opts.
3. `runRoleTask` (cli.ts) accepts `roleId` and threads it through to
   the codex branch's runCodexImpl call.
4. The test-writer dispatch site at RUN_GEMINI_TEST_SPEC (cli.ts:8520)
   passes `roleId: "test-writer"`. Test-fixer + primary-impl call sites
   intentionally don't set roleId — they EXPECT the implementor prompt
   (their phase contract is "make tests pass").

Defense-in-depth preserved: PR #102's role boundary still fires on any
agent that bypasses the prompt instruction (adversarial or otherwise).
The new prompt is the FIRST line of defense; PR #102 stays as the
backstop. The fix moves the refusal upstream — codex now declines at
the prompt level rather than the post-spawn refusal level — so the
worktree stays clean and operators don't need `--mark-phase-committed`
to recover.

Tests (T-H1 through T-H6, 8 tests in
build/orchestrator/__tests__/codex-role-prompt.test.ts):
- T-H1: roleId "test-writer" → test-writer prompt; asserts legacy
  copy is gone from the prompt
- T-H2: roleId undefined → legacy implementor prompt (back-compat)
- T-H3: roleId "primary-impl" → legacy implementor prompt
- T-H4: roleId "test-fixer" → legacy implementor prompt (intentional)
- preserves sandbox + --add-dir + reasoning flags regardless of roleId
- T-H5a/b/c: static-grep guards on the cli.ts threading wiring
  (signature, runCodexImpl forward, dispatch-site literal)

8/8 pass.

Architectural payoff for the "no commit / dirty tree" symptom space:

| Bug | Symptom | Resolved by |
|---|---|---|
| A | FEATURE_REDO loop: impl gets no findings, makes no commit | PR #103 |
| E | Codex sandbox EPERM on .git/worktrees/<run>/index.lock | PR #104 |
| F | Prose summary lacks parseable paths, recovery refuses | PR #105 |
| H | Codex test-writer over-reaches into production code | THIS PR |

After this lands, a codex/gpt-5.5 (or any future codex model) in the
test-writer role gets instructions matching the role's hygiene contract,
and the only remaining "no commit" path is the agent genuinely doing
nothing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eHygieneId (inline hardening from /review pipeline)

Four reviewers — testing specialist, red-team specialist, Codex
adversarial review (P1, 9/10), Codex structured review (P2) — all
independently flagged the SAME contradiction in the initial Bug H fix:
the test-writer prompt promised codex that `*_test.go` files were
allowed, but classifyTestWriterCommit's enforcer regex at cli.ts:2138
(`/(^|\/)(__tests__|test|tests|spec|specs)\//i` +
`/\.(test|spec)\.[a-z]+$/i`) only accepts directory-prefixed paths and
`.test.*` / `.spec.*` suffixes. A codex test-writer in a Go project
would follow the new prompt, commit `pkg/foo_test.go`, then PR #102's
boundary refuse it — re-creating the exact Bug H symptom with a
different cause.

Multi-source confirmation across 4 independent reviewers makes this a
true HIGH finding (combined confidence 9/10).

This commit:

1. Rewrites the test-writer prompt to match the enforcer surface
   exactly. Drops the `*_test.* (Go)` claim. Adds a 10-line comment
   block at the prompt site naming the cli.ts enforcer literal and
   warning future maintainers to keep prompt NARROWER than enforcer
   (never wider). Also clarifies the "RED expected" wording so codex
   can't misread it as "make existing tests fail" — separates "ADD
   NEW failing tests" from "leave existing passing tests untouched"
   (separate red-team MEDIUM finding).

2. Reuses cli.ts's canonical `RoleHygieneId` strict union (11 role
   IDs) instead of the inline ad-hoc union (3 + `| string` escape).
   Maintainability finding from the testing specialist: the inline
   union loses TypeScript exhaustiveness checking; the canonical type
   was already exported and enumerates all role IDs in the
   orchestrator. `import type` keeps the cli ↔ sub-agents reference
   cycle compile-time-only (erased at runtime), preserving the
   existing module boundary.

3. Adds `roleId?: RoleHygieneId` to `RunConfiguredRoleTaskOpts` as
   defensive future-proofing. Today no configure.cm role is a
   test-writer (slash-command paths flow through runCodexReview, not
   buildCodexImplArgv), so this is documentation-only. Comment says
   so. The benefit: if a future skill adds a configured test-writer,
   the threading is ready and resolveFallbackForConfigured preserves
   it across the backup hop via `...parentOpts`.

4. Five new tests (T-H5d + T-H6a/b/c/d, in a second `describe` block
   "Bug H — prompt-enforcer drift guard"):
   - T-H5d: counts `roleId: "test-writer"` occurrences in cli.ts and
     asserts exactly 2 (the runRoleTask dispatch + the
     applyMutableAgentHygiene call). Prevents accidental copy-paste
     onto a non-test-writer dispatch.
   - T-H6a: prompt enumerates the 5 enforcer-accepted dir prefixes.
   - T-H6b: prompt does NOT mention `*_test.go` (the pattern that
     caused this hardening cycle).
   - T-H6c: prompt clarifies "ADD new failing tests" vs "do not
     modify existing passing tests".
   - T-H6d: extracts the enforcer's TEST_PATH_RE dir-alternation
     group from cli.ts source and asserts the prompt mentions every
     directory the enforcer accepts. A divergence in either direction
     fails CI and forces prompt + enforcer to be updated together.
   Updated T-H1 to assert against the new prompt copy.

5. Adds 10-line comment block at the test-writer dispatch site naming
   the enforcer cross-reference (matches the comment on
   buildCodexImplArgv).

13/13 pass on the regression suite. All four reviewers'
single-confirmed findings addressed (1 HIGH inline-fixed,
1 multi-confirmed HIGH inline-fixed, 1 MEDIUM inline-fixed,
1 MEDIUM inline-added-test). Low findings deferred as plan
follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@anbangr anbangr merged commit 3a0d30d into main May 26, 2026
anbangr added a commit that referenced this pull request May 27, 2026
…tead of "exited 0" (Bug I) + side-fix rsync preserves user BUGREPORTs (#107)

Closes Bug I — `RUN_GEMINI_TEST_SPEC` failure handler in phase-runner.ts
was the lone holdout still calling `renderRoleStepFailureMessage`
directly, while every other dispatcher (primary-impl, test-fixer,
dual-impl, judge, etc.) already used `geminiExitError`. The difference
matters: when applyMutableAgentHygiene catches an empty output / no
commit / recovery-refused failure and wraps the success result as
exitCode=1 via hygieneFailureResult, the legacy `renderRoleStepFailureMessage`
rendered the misleading "test-spec writer exited 1" message that hid
the actual reason behind a forged exit code. The operator hunting for
an LLM crash that never happened.

`geminiExitError` already implements the right idiom: pull the first
non-header line of the hygiene body ("left an empty output summary",
"did not create a new commit", "recovery refused", etc.) and fall
through to the legacy `exited N; see <log>` shape when no hygiene
marker is present. This PR aligns RUN_GEMINI_TEST_SPEC with the rest
of the codebase — one helper swap, 18 lines including a 14-line
comment explaining the historical context.

The canonical incident: AGNT2 operator on 2026-05-26 hit Phase 3.1
test-writer halts 4 times in a row. Every halt surfaced as
`✗ Phase 3.1 ... failed: test-spec writer exited 0`. Worktree clean,
monitor agent JSON 0 bytes, no commit, no obvious crash. The operator
had to grep the hygiene-log file separately to discover the actual
reason was an empty output summary from the codex backup. After this
PR lands, the same incident would surface as
`✗ Phase 3.1 ... failed: test-spec writer hygiene failed: test-writer
left an empty output summary: <path>; see <hygiene-log>` — the
operator sees the cause and the forensics pointer in one line.

Side-fix in the same PR: `gstack-upgrade` Phase 4 rsync now uses
`--exclude 'inbox/BUGREPORT-*'` to preserve user-filed bug reports in
the install dir's inbox/. Without this, the AGNT2 incident's other
half repeats: the operator filed
`~/.claude/skills/gstack/inbox/BUGREPORT-test-writer-fixer-exit-nonzero-with-valid-work.md`
directly into the install dir, then the next /gstack-upgrade rsync
`--delete`d it because the source repo's inbox/ didn't have that
file. Bug reports are USER state, not skill state. The exclude pattern
preserves `BUGREPORT-*` files only; example bug reports tracked in
the source repo's inbox/ still sync normally.

Tests (T-I1 through T-I4 in
build/orchestrator/__tests__/test-spec-writer-hygiene-error-attribution.test.ts):

- T-I1: hygieneFailure result → next.error names the hygiene reason
  and includes the agent log path for forensics; the legacy misleading
  "test-spec writer exited N" shape MUST NOT appear.
- T-I2: vanilla nonzero exit (no hygiene marker) → legacy "exited N"
  shape preserved. Important: when codex genuinely crashed with
  exit 2, "exited 2" is the right signal — only hygiene-converted
  failures need the new attribution.
- T-I3: static-grep guard that RUN_GEMINI_TEST_SPEC uses
  geminiExitError("test-spec writer", result) literally.
- T-I4: static-grep guard that phase-runner.ts has at least 3
  geminiExitError callers — pin the helper-adoption ratio so a future
  regression that swaps in renderRoleStepFailureMessage shows up at
  review time.

4/4 pass.

Architectural payoff for the "no commit / dirty tree" symptom space:

| Bug | Symptom | Resolved by |
|---|---|---|
| A | FEATURE_REDO findings deadlock | PR #103 |
| E | Codex sandbox EPERM on .git/worktrees/<run>/index.lock | PR #104 |
| F | Prose summary lacks parseable paths, recovery refuses | PR #105 |
| H | Codex test-writer over-reaches into production code | PR #106 |
| I | "test-spec writer exited 0" hides the actual hygiene reason | THIS PR |

After this lands, every test-writer failure carries a one-line
operator-facing reason that names the actual gate it tripped. The
"no commit" symptom becomes a real "agent did nothing" signal —
matched only when the LLM CLI itself exited cleanly without making
any changes — not a catch-all for hygiene conversions, missing specs,
or empty outputs.

Note: the bug report (BUGREPORT-gstack-build-hygiene-gate-kills-test-writer-work.md)
also names a secondary contributing factor — the codex-bridge `ps`
probe race that produces `exit=null, timedOut=false` on primary
calls. That's a deeper infrastructure fix (process-group tracking,
setsid + PGID kill semantics) deliberately out of scope for this PR.
Once Bug I lands, the operator-facing message for that case will
also be clearer because the fallback's empty output will be named
specifically, instead of buried behind the bridge's exit=null.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
anbangr added a commit that referenced this pull request May 27, 2026
…erdict write (Bug J) (#108)

* fix(build/orchestrator): codex /review read-only sandbox blocks own verdict write (Bug J)

Closes Bug J — `runReviewGates`'s `runGate` closure forced
`-s read-only` on every codex /review call where `phase.kind !== "code"`
or `auditOnly === true`. The intent was to prevent the reviewer from
modifying source on non-code phases. The result: codex couldn't write
its own verdict file either, because stageCodexIO stages the output
inside the worktree (so it survives the cwd-restricted sandbox), but
read-only blocks ALL writes including the staged output.

Canonical incident (filed by `/build investigate` from the live mitosis
run, ~/.gstack/skill-faults/inbox/BUGREPORT-2026-05-27-build-manual-investigation-d4b344.md):

  mitosis-oasis-phase1-part2 phase 5.1 (GovernanceRegistry fixture
  review) failed all 4 review iterations with 0-byte output files.
  Codex log shows:
    ERROR codex_core::tools::router:
      error=patch rejected: writing is blocked by read-only sandbox;
      rejected by user approval settings
    BLOCKED: read-only sandbox prevented writing
      <worktree>/.llm-tmp/gstack-codex-5.1-4-review-output.md

  Empty output -> parser sees no GATE PASS/FAIL line -> orchestrator
  triggers primary-impl-rerun recovery -> recovery fails its own hygiene
  check with "blind execution -- input file unreachable; changes
  discarded" -> phase FAILED.

  Affects every codex review call when review.provider=codex on a
  non-code phase. Estimated to hit 5+ more phases in F5-F9 if not
  fixed.

Fix: drop the read-only force-default. The source-mutation concern that
motivated the heuristic is already covered by the SAME defense-in-depth
model the feature-review path (sub-agents.ts:3921-3944) uses:

  1. Prompt forbids worktree edits (verdict-only output target).
  2. applyGateHygiene runs after every gate and catches any worktree
     mutation post-spawn, converting it to HYGIENE_FAULT.
  3. Same-shape repeat detector halts the loop after 2 identical
     HYGIENE_FAULTs (caller upstream).

The `attempt.sandbox` explicit-override path (used by
shouldRetryCodexGateWithDangerFullAccess for the sandbox-retry
fallback) is preserved verbatim. The env-driven override at
`GSTACK_BUILD_CODEX_REVIEW_SANDBOX` still wins. Source-mutation
enforcement comes from the hygiene gate, not from forcing read-only
at launch.

22-line comment block at the runGate call site documents:
- What the bug looked like (read-only blocks verdict write)
- Why the heuristic was dropped (broken intent: blocks own writes)
- Where source-mutation enforcement actually lives (applyGateHygiene
  + same-shape repeat detector)
- What overrides still work (attempt.sandbox, env var)

Tests (T-J1 through T-J4 in
build/orchestrator/__tests__/codex-review-sandbox-bug-j.test.ts):

- T-J1: static-grep guard — no `const readOnlyArtifactGate =` in
  cli.ts. The identifier name appears once in the Bug J comment
  block as intentional documentation; the regex matches the const
  declaration form (the actual bug source), not the bare token.
- T-J2: static-grep guard — no `attempt?.sandbox ?? (readOnlyArtifactGate`
  or `readOnlyArtifactGate ? "read-only"` patterns. Pins both
  half-revert scenarios at the bit level.
- T-J3: static-grep guard — runGate closure now reads
  `const sandbox = attempt?.sandbox;` AND the cli.ts contains
  "Bug J" + "applyGateHygiene" tokens (the comment block
  cross-reference is required so the next maintainer who touches
  the runGate dispatcher sees the historical context).
- T-J4: behavioral guard — the runGate closure block (matched as
  source range) MUST NOT contain `"read-only"` literal. Other
  `"read-only"` literals in cli.ts (judge sandbox at runJudgeRole,
  runtime sandbox-retry paths) are fine; this test specifically
  scopes the assertion to the runGate dispatcher.

4/4 pass.

Architectural payoff for the "no commit / dirty tree" symptom space:

| Bug | Symptom | Resolved by |
|---|---|---|
| A | FEATURE_REDO findings deadlock | PR #103 |
| E | Codex sandbox EPERM on .git/worktrees/<run>/index.lock | PR #104 |
| F | Prose summary lacks parseable paths, recovery refuses | PR #105 |
| H | Codex test-writer over-reaches into production code | PR #106 |
| I | "test-spec writer exited 0" hides hygiene reason | PR #107 |
| J | Codex /review read-only sandbox blocks own verdict write | THIS PR |

After Bug J lands, the same defense-in-depth that secured feature-review
now secures phase-review. Source mutations stay blocked by the hygiene
gate (post-spawn detection + HYGIENE_FAULT classification +
same-shape repeat halting), and codex's verdict write succeeds on the
first try — no cascading primary-impl-rerun recovery, no operator
intervention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build/orchestrator): scope review-gate auto-commit to qa only + tighten test guards (Bug J inline hardening)

Inline hardening from /review pipeline. Security specialist (HIGH) and
Codex adversarial (P0, 9/10) both independently flagged the same
MULTI-CONFIRMED security regression in the initial Bug J commit:

The initial Bug J fix dropped the read-only sandbox floor on non-code
review gates. The PR comment claimed defense-in-depth via
applyGateHygiene converting reviewer source mutations to HYGIENE_FAULT.
That claim is WRONG. `phaseAllowsGateSourceFixes(opts.phase)` returns
true for research/writing/experiment/manual phases unconditionally,
which means applyGateHygiene's auto-commit path (`maybeAutoCommitTestOnlyDirty`
with `allowNonTestPaths: true`) AUTO-COMMITS the reviewer's source
mutations instead of rejecting them.

Net effect of the unhardened Bug J: silent reviewer-driven commits on
every non-code phase. A prompt-injection in a reviewed artifact could
persist source changes through the review gate, and the operator would
see a clean GATE PASS.

This hardening commit:

1. Scopes the `allowNonTestPaths` escape hatch to the QA gate ONLY.
   review and reviewSecondary gates are READ-ONLY by contract — they
   evaluate the implementation and emit a verdict; they don't fix code.
   QA keeps fix-on-review semantics (that's intentional — QA was
   designed for it). Introduces a local `gateAllowsSourceFixes` const
   that ANDs `name === "qa"` with the existing
   `phaseAllowsGateSourceFixes(opts.phase)` heuristic. Applied at both
   applyGateHygiene call sites in runReviewGates (primary + sandbox-retry
   paths).

2. 14-line comment block at the runGate call site documents:
   - Why the qa-only scoping exists (security regression class)
   - Why QA is exempt (fix-on-review is intentional for QA)
   - What the read-only sandbox used to protect structurally (now via
     hygiene-gate scope rather than OS sandbox)

3. Adds T-J5 (positive static-grep guard): pins the new
   `gateAllowsSourceFixes` local + both applyGateHygiene call sites
   using it.

4. Adds T-J6 (negative static-grep guard): runReviewGates block (whole
   function body via regex match) must NOT contain
   `allowNonTestPaths: phaseAllowsGateSourceFixes(opts.phase)` — the
   unhardened code form. A refactor that re-collapses the source
   fails CI.

5. Widens T-J4 regex window from 2500 → 5000 chars (adversarial Finding
   5: original block was at 2249/2500, one new if-block away from
   overflowing the window and crashing the test with TypeError on the
   non-null assertion).

6/6 pass on the regression suite (T-J1..T-J6).

Deferred concerns (multi-source adversarial findings, will become Bug K):

- HEAD-advance not blocked on review gates (Codex P0 #2): a reviewer
  can `git commit` source changes (the new workspace-write sandbox
  allows it), write GATE PASS, leave the tree clean — no HYGIENE_FAULT
  because validatePostAgentHygiene doesn't check `requireNewCommit`
  for review gates. Fix: pass `requireNewCommit: false` AND add a
  separate "HEAD must not advance" check on review/reviewSecondary
  gates.
- Failed/timed-out reviewers skip hygiene entirely (Codex P1 #3):
  applyGateHygiene early-returns on `timedOut || exitCode !== 0` at
  cli.ts:6177. A reviewer can edit source, crash before verdict, retry
  runs against the mutated tree. Fix: always run dirty-discard check
  for review gates regardless of exit code, OR add explicit
  `discardWorktreeChangesOnFailure: true` opt for review gate calls.
- .git/ directory mutations not caught by git status (Security MEDIUM #2):
  hooks, .git/config remote rewrites, .git/info/exclude tampering all
  bypass validatePostAgentHygiene because git status doesn't report
  them. Fix: snapshot mtime+sha256 of .git/config, .git/hooks/*,
  .git/info/exclude, .git/HEAD in captureGitSnapshot.

These three are separate architectural concerns on the same surface
that deserve their own dedicated PR (Bug K) rather than a panic patch
on Bug J. This PR's qa-only scoping closes the multi-confirmed P0 from
2 of 3 reviewers; Bug K will close the residuals from codex adversarial.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
anbangr added a commit that referenced this pull request May 27, 2026
… from PR #108 (Bug K) (#109)

Closes Bug K (K1+K2+K3) — three trust-boundary residuals left open
after PR #108's qa-only `gateAllowsSourceFixes` scoping. All three
were multi-source confirmed by the security specialist + Codex
adversarial in PR #108's review pipeline, documented in the PR
comment, and queued as follow-up. This PR closes them together.

The underlying architectural shift: PR #108 dropped the read-only
sandbox floor on review/reviewSecondary gates so codex could write
its verdict file. The PR comment promised defense-in-depth via
post-spawn `applyGateHygiene`. Three gaps remained:

K1 — HEAD-advance not blocked (HIGH, codex P0 #2)

  A reviewer can `git commit` source changes under the new
  workspace-write sandbox, write GATE PASS, leave the tree clean,
  and the gate accepts the commit silently. validatePostAgentHygiene
  only checked dirty deltas; HEAD advance was invisible.

  Fix: `applyGateHygiene` accepts `requireNoHeadAdvance?: boolean`.
  When true, compares before/after HEAD; mismatch emits
  "reviewer roles are read-only; only QA may fix-on-review".
  runReviewGates passes `requireNoHeadAdvance: gateIsReviewer`
  (true for review/reviewSecondary, false for QA — QA's contract
  IS to commit fix-on-review changes).

K2 — Failed/timed-out reviewers skipped hygiene entirely (MEDIUM, codex P1 #3)

  Pre-fix early-return at applyGateHygiene's top preserved any
  worktree mutations the reviewer left behind. The next retry /
  recovery path ran against the mutated tree.

  Fix: `applyGateHygiene` accepts `discardOnFailure?: boolean`.
  When true AND the gate spawn failed (timedOut || exitCode !== 0),
  call `discardBlindExecutionChanges` to roll back worktree
  mutations before returning. The original result (failed exit
  code, timeout flag) is preserved — this is a side-effect cleanup,
  not a result rewrite. runReviewGates passes `discardOnFailure: true`
  for ALL gate roles (review/reviewSecondary/qa) because all three
  share the workspace-write-with-no-OS-floor failure mode.

K3 — `.git/` directory mutations bypassed hygiene gate (MEDIUM, security #2)

  `git status --porcelain` doesn't report mutations under `.git/`:
    - `.git/config` (remote URL rewrites for exfiltration)
    - `.git/hooks/*` (post-merge hook injection)
    - `.git/info/exclude` (silently exclude files from `git add`)
    - `.git/HEAD` (silent branch swap)
    - `.git/packed-refs` (ref tampering)

  Pre-PR-#108 the OS read-only sandbox blocked all of these at the
  syscall level. With workspace-write, nothing did.

  Fix:
  - `GitSnapshot` interface gains optional `gitDirHashes?: Map<string, string>`
  - `captureGitSnapshot` accepts `captureGitDirContents?: boolean`. When
    true, sha256-hashes HEAD, config, info/exclude, packed-refs, plus a
    sorted-listing hash of `hooks/` (with executable-bit detection so
    chmod +x on a previously-disarmed hook also surfaces). Uses
    `git rev-parse --git-dir` AND `--git-common-dir` so linked
    worktrees track BOTH their per-worktree files (HEAD, index) AND
    the shared common dir's hooks/config (mirrors the Bug E
    --add-dir resolution pattern).
  - New `validateGitDirUnchanged` validator diffs before/after
    gitDirHashes. Each mismatch produces one error naming the exact
    `.git/` path so operators can grep.
  - runReviewGates upgrades its `captureGitSnapshot` call to
    `{ captureContents: true, captureGitDirContents: true }`.
  - `applyGateHygiene` checks list now includes
    `validateGitDirUnchanged` alongside the existing two validators.
    No-op when `before.gitDirHashes` is absent (back-compat: only
    gates that opt in to capturing get the K3 check).

Tests (T-K1..T-K6 + T-K6a..T-K6f in
build/orchestrator/__tests__/bug-k-review-gate-trust-boundary.test.ts):

Behavioral (K3 enforcement):
- T-K1: captureGitSnapshot populates gitDirHashes when requested.
- T-K2: gitDirHashes is absent by default (back-compat).
- T-K3: validateGitDirUnchanged detects post-merge hook injection
  (creates `.git/hooks/post-merge` between snapshots, asserts error
  contains "hooks/" + "reviewer/QA roles must NOT modify git metadata").
- T-K4: validateGitDirUnchanged detects .git/config rewrite (uses
  `git remote add evil <url>` between snapshots).
- T-K5: validateGitDirUnchanged is a no-op when the before snapshot
  doesn't have gitDirHashes (back-compat regression guard).

Static-grep wiring (K1/K2/K3 surface guards):
- T-K6a: applyGateHygiene declares requireNoHeadAdvance + discardOnFailure opts.
- T-K6b: BOTH runReviewGates call sites (primary gate + sandbox-retry)
  pass requireNoHeadAdvance=gateIsReviewer + discardOnFailure=true.
- T-K6c: captureGitSnapshot accepts captureGitDirContents; runReviewGates
  passes both captureContents AND captureGitDirContents true.
- T-K6d: applyGateHygiene checks list now includes validateGitDirUnchanged.
- T-K6e: K1 error message names the exact regression class.
- T-K6f: K2 failed-gate cleanup wires through discardBlindExecutionChanges.

11/11 pass on the regression suite.

Architectural payoff:

| Bug | Symptom | Resolved by |
|---|---|---|
| A | FEATURE_REDO findings deadlock | PR #103 |
| E | Codex sandbox EPERM on linked worktrees | PR #104 |
| F | Prose summary lacks parseable paths, recovery refuses | PR #105 |
| H | Codex test-writer over-reaches into production code | PR #106 |
| I | "test-spec writer exited 0" hides hygiene reason | PR #107 |
| J | Codex /review read-only sandbox blocks own verdict | PR #108 |
| K | 3 review-gate trust-boundary residuals from PR #108 | THIS PR |

After Bug K lands, the review-gate trust boundary is structurally
sound:
- Reviewer mutations to source: caught by validatePostAgentHygiene
  (existing) + scoped allowNonTestPaths (PR #108)
- Reviewer commits: caught by K1 requireNoHeadAdvance
- Reviewer mutations on failed runs: discarded by K2 discardOnFailure
- Reviewer mutations to .git metadata: caught by K3 validateGitDirUnchanged

The same defense-in-depth model now matches feature-review's
(sub-agents.ts:3921-3944) but with full git-state coverage instead of
just worktree-state.

Plan reference: ~/.claude/plans/fixing-plan-bugs-k-through-n-post-pr-108.md
PR #108 comment: #108 (comment)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant