feat(guardrails): dual AI review gate — GLM + Codex (Issue #143)#153
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…L + 2 HIGH CRITICAL-1: Remove TOCTOU-prone composite review_state writes. reviewGate() now computes gate status on read, not via denormalized writes. CRITICAL-2: Codex detection switched from output regex (spoofable) to input args matching (prompt/command contains "review"). HIGH-2: Cache hasCodexMcp flag at session.created; re-apply auto-satisfy on mutation resets when Codex MCP is not configured. MEDIUM-1: checklist review_fresh requires at least one review done. MEDIUM-4: Regex dots escaped in Codex detection pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add syncReviewState() helper: computes composite review_state AFTER individual state writes (eliminates TOCTOU from pre-write reads) - Add review_state: "" to mutation reset mark() calls - Update replay.ts: add review_glm_state/review_codex_state to state, update guard text to match new reviewLine() format - Update guardrails.test.ts: compact context check, review_state test now triggers both GLM and Codex reviews before expecting "done" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
This PR doesn't fully meet our contributing guidelines and PR template. What needs to be fixed:
Please edit this PR description to address the above within 2 hours, or it will be automatically closed. If you believe this was flagged incorrectly, please let a maintainer know. |
|
New PR opened -- automated review will run on the next push. To trigger a manual review, comment |
|
The following comment was made by an LLM, it may be inaccurate: |
…H-1) Codex MCP detection block now sets reviewed:true for consistency with the issue-close verification gate that checks this boolean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…RNING HIGH-1: edits_since_review only zeroed in syncReviewState() when gate.done=true (prevents edit count loss between sequential reviews). HIGH-2: Codex MCP name matching tightened to /^codex$/i or /^mcp[_-]codex$/i (prevents false positives on substring matches). WARNING-1: hasCodexMcp default changed to false (safe fallback when settings read fails — doesn't require unavailable Codex review). WARNING-3: Codex MCP output validated (reject empty/short < 20 chars). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds dual AI review tracking and merge gating to the guardrails profile, requiring independent “done” states for GLM and Codex (with tier-aware merge rules), and updates scenario fixtures/tests accordingly.
Changes:
- Introduce
review_glm_state/review_codex_stateand compute compositereview_stateviasyncReviewState()after writes. - Enforce tiered merge gates: FULL requires both reviews; LIGHT allows either review or “checks ran + no severe findings”.
- Update scenario replays and guardrails scenario tests to reflect the new dual-review state.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| packages/guardrails/profile/plugins/guardrail.ts | Implements dual-review state tracking, tier-aware merge gates, Codex availability probing, and review state syncing. |
| packages/opencode/test/scenario/guardrails.test.ts | Updates scenario tests to set/expect dual review completion behavior. |
| packages/opencode/test/scenario/replay.ts | Updates replay guard/state fixtures to include dual review state and new compact text. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (branchWarning) { | ||
| await seen("branch_workflow.warning", { warning: branchWarning }) | ||
| } | ||
| // Codex MCP availability: cache flag and auto-satisfy codex review if not configured | ||
| try { | ||
| const settingsPath = path.join(process.env.HOME || "~", ".claude", "settings.json") |
There was a problem hiding this comment.
The Codex MCP availability probe builds settingsPath from process.env.HOME || "~", which (a) doesn’t expand ~, (b) won’t work on Windows where HOME is often unset (USERPROFILE is used), and (c) bypasses the test isolation mechanism that sets OPENCODE_TEST_HOME (so scenario tests can accidentally read a developer’s real ~/.claude/settings.json). This can lead to incorrectly auto-satisfying Codex review (or failing to detect it) depending on the environment.
Consider resolving the home directory via the repo’s Global.Path.home (respects OPENCODE_TEST_HOME) or os.homedir() with a Windows-friendly fallback, and avoid using the literal "~" path segment. Also consider validating mcpServers is an object before Object.keys() to keep behavior deterministic on malformed settings.
| state = await Bun.file(files.state).json() | ||
| expect(state.review_state).toBe("") | ||
| expect(state.edits_since_review).toBe(1) | ||
|
|
There was a problem hiding this comment.
This test verifies that review_state resets after apply_patch/mutating bash, but it no longer asserts the new per-review fields (review_glm_state, review_codex_state) are reset as well. Since the PR’s core behavior is “both states reset on edits”, adding explicit expectations here would prevent regressions where only the composite review_state clears but one of the underlying states remains done (or vice versa).
| await mark({ | ||
| reviewed: true, | ||
| review_codex_state: "done", | ||
| review_codex_at: new Date().toISOString(), |
There was a problem hiding this comment.
When Codex review completion is detected via the MCP tool (mcp__codex__codex), the state update only sets review_codex_state/review_codex_at. Unlike the task-based path above, it does not set reviewed: true, update review_at/review_agent, or reset edits_since_review to 0. As a result, the guardrail can continue to report the review as stale (and review_fresh remains failing) immediately after a successful Codex review, and gh issue close will still treat the session as not reviewed.
Consider aligning the MCP completion path with the task completion path by updating the shared review metadata and resetting freshness counters when Codex review is marked done.
| await mark({ | |
| reviewed: true, | |
| review_codex_state: "done", | |
| review_codex_at: new Date().toISOString(), | |
| const reviewAt = new Date().toISOString() | |
| await mark({ | |
| reviewed: true, | |
| review_at: reviewAt, | |
| review_agent: "codex", | |
| edits_since_review: 0, | |
| review_codex_state: "done", | |
| review_codex_at: reviewAt, |
Summary
review_glm_state+review_codex_state/codex-reviewmanual triggerhasCodexMcpflag)syncReviewState()computes compositereview_stateafter individual writes (avoids TOCTOU)Closes #143
Review findings addressed
Test plan
/review→ GLM done →/codex-review→ Codex done → merge allowed🤖 Generated with Claude Code