Skip to content

feat(guardrails): dual AI review gate — GLM + Codex (Issue #143)#153

Merged
terisuke merged 5 commits into
devfrom
feat/dual-review-gate
Apr 9, 2026
Merged

feat(guardrails): dual AI review gate — GLM + Codex (Issue #143)#153
terisuke merged 5 commits into
devfrom
feat/dual-review-gate

Conversation

@terisuke

@terisuke terisuke commented Apr 9, 2026

Copy link
Copy Markdown

Summary

  • Add dual review tracking: review_glm_state + review_codex_state
  • Merge gate: FULL tier requires both, LIGHT tier requires one or CI green
  • Auto-review sets GLM only; Codex via /codex-review manual trigger
  • Codex MCP availability: auto-satisfy if not configured (hasCodexMcp flag)
  • syncReviewState() computes composite review_state after individual writes (avoids TOCTOU)
  • Input-based Codex detection (prompt/command matching, not output content)

Closes #143

Review findings addressed

  • CRITICAL-1: TOCTOU race → syncReviewState() post-write pattern
  • CRITICAL-2: Output spoofing → input args matching
  • HIGH-2: hasCodexMcp cached, re-applied on mutation resets
  • MEDIUM-1: checklist review_fresh requires at least one review done
  • MEDIUM-4: Regex dots escaped

Test plan

  • 34/34 scenario tests pass (updated for dual review)
  • Typecheck passes
  • Manual: /review → GLM done → /codex-review → Codex done → merge allowed
  • Manual: edit after review → both states reset → merge blocked

🤖 Generated with Claude Code

terisuke and others added 3 commits April 10, 2026 00:21
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…L + 2 HIGH

CRITICAL-1: Remove TOCTOU-prone composite review_state writes.
  reviewGate() now computes gate status on read, not via denormalized writes.

CRITICAL-2: Codex detection switched from output regex (spoofable)
  to input args matching (prompt/command contains "review").

HIGH-2: Cache hasCodexMcp flag at session.created; re-apply
  auto-satisfy on mutation resets when Codex MCP is not configured.

MEDIUM-1: checklist review_fresh requires at least one review done.
MEDIUM-4: Regex dots escaped in Codex detection pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add syncReviewState() helper: computes composite review_state AFTER
  individual state writes (eliminates TOCTOU from pre-write reads)
- Add review_state: "" to mutation reset mark() calls
- Update replay.ts: add review_glm_state/review_codex_state to state,
  update guard text to match new reviewLine() format
- Update guardrails.test.ts: compact context check, review_state test
  now triggers both GLM and Codex reviews before expecting "done"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 9, 2026 15:36
@github-actions

github-actions Bot commented Apr 9, 2026

Copy link
Copy Markdown

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

@github-actions

github-actions Bot commented Apr 9, 2026

Copy link
Copy Markdown

New PR opened -- automated review will run on the next push.

To trigger a manual review, comment /review on this PR.

@github-actions

github-actions Bot commented Apr 9, 2026

Copy link
Copy Markdown

The following comment was made by an LLM, it may be inaccurate:

terisuke and others added 2 commits April 10, 2026 00:38
…H-1)

Codex MCP detection block now sets reviewed:true for consistency
with the issue-close verification gate that checks this boolean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…RNING

HIGH-1: edits_since_review only zeroed in syncReviewState() when
  gate.done=true (prevents edit count loss between sequential reviews).

HIGH-2: Codex MCP name matching tightened to /^codex$/i or /^mcp[_-]codex$/i
  (prevents false positives on substring matches).

WARNING-1: hasCodexMcp default changed to false (safe fallback when
  settings read fails — doesn't require unavailable Codex review).

WARNING-3: Codex MCP output validated (reject empty/short < 20 chars).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds dual AI review tracking and merge gating to the guardrails profile, requiring independent “done” states for GLM and Codex (with tier-aware merge rules), and updates scenario fixtures/tests accordingly.

Changes:

  • Introduce review_glm_state / review_codex_state and compute composite review_state via syncReviewState() after writes.
  • Enforce tiered merge gates: FULL requires both reviews; LIGHT allows either review or “checks ran + no severe findings”.
  • Update scenario replays and guardrails scenario tests to reflect the new dual-review state.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
packages/guardrails/profile/plugins/guardrail.ts Implements dual-review state tracking, tier-aware merge gates, Codex availability probing, and review state syncing.
packages/opencode/test/scenario/guardrails.test.ts Updates scenario tests to set/expect dual review completion behavior.
packages/opencode/test/scenario/replay.ts Updates replay guard/state fixtures to include dual review state and new compact text.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 712 to +717
if (branchWarning) {
await seen("branch_workflow.warning", { warning: branchWarning })
}
// Codex MCP availability: cache flag and auto-satisfy codex review if not configured
try {
const settingsPath = path.join(process.env.HOME || "~", ".claude", "settings.json")

Copilot AI Apr 9, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Codex MCP availability probe builds settingsPath from process.env.HOME || "~", which (a) doesn’t expand ~, (b) won’t work on Windows where HOME is often unset (USERPROFILE is used), and (c) bypasses the test isolation mechanism that sets OPENCODE_TEST_HOME (so scenario tests can accidentally read a developer’s real ~/.claude/settings.json). This can lead to incorrectly auto-satisfying Codex review (or failing to detect it) depending on the environment.

Consider resolving the home directory via the repo’s Global.Path.home (respects OPENCODE_TEST_HOME) or os.homedir() with a Windows-friendly fallback, and avoid using the literal "~" path segment. Also consider validating mcpServers is an object before Object.keys() to keep behavior deterministic on malformed settings.

Copilot uses AI. Check for mistakes.
Comment on lines 1204 to 1207
state = await Bun.file(files.state).json()
expect(state.review_state).toBe("")
expect(state.edits_since_review).toBe(1)

Copilot AI Apr 9, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test verifies that review_state resets after apply_patch/mutating bash, but it no longer asserts the new per-review fields (review_glm_state, review_codex_state) are reset as well. Since the PR’s core behavior is “both states reset on edits”, adding explicit expectations here would prevent regressions where only the composite review_state clears but one of the underlying states remains done (or vice versa).

Copilot uses AI. Check for mistakes.
Comment on lines +1483 to +1486
await mark({
reviewed: true,
review_codex_state: "done",
review_codex_at: new Date().toISOString(),

Copilot AI Apr 9, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Codex review completion is detected via the MCP tool (mcp__codex__codex), the state update only sets review_codex_state/review_codex_at. Unlike the task-based path above, it does not set reviewed: true, update review_at/review_agent, or reset edits_since_review to 0. As a result, the guardrail can continue to report the review as stale (and review_fresh remains failing) immediately after a successful Codex review, and gh issue close will still treat the session as not reviewed.

Consider aligning the MCP completion path with the task completion path by updating the shared review metadata and resetting freshness counters when Codex review is marked done.

Suggested change
await mark({
reviewed: true,
review_codex_state: "done",
review_codex_at: new Date().toISOString(),
const reviewAt = new Date().toISOString()
await mark({
reviewed: true,
review_at: reviewAt,
review_agent: "codex",
edits_since_review: 0,
review_codex_state: "done",
review_codex_at: reviewAt,

Copilot uses AI. Check for mistakes.
@terisuke terisuke merged commit c503c1d into dev Apr 9, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(guardrails): dual AI review gate — require LGTM from both Codex and GLM 5.1 before merge

2 participants