Skip to content

v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs#1966

Merged
garrytan merged 9 commits into
mainfrom
garrytan/codex-review-default-on
Jun 11, 2026
Merged

v1.57.10.0 feat: Codex review default-on across review/ship/plan/docs#1966
garrytan merged 9 commits into
mainfrom
garrytan/codex-review-default-on

Conversation

@garrytan

Copy link
Copy Markdown
Owner

What this ships

Makes OpenAI Codex cross-model review a default-on, opt-out standard step, governed by one master switch (codex_reviews, default enabled).

  • Plan reviews (/plan-ceo-review, /plan-eng-review, /plan-design-review, /plan-devex-review): the Codex "outside voice" is now default-on — the opt-in "Want an outside voice?" question is gone. Falls back to a Claude subagent when Codex is missing/unauthed. Incorporating findings still requires explicit user approval (user sovereignty preserved).
  • /document-release: new generateCodexDocReview pass — reviews touched docs against the release diff for stale claims, undocumented new surface, and over/under-sold CHANGELOG entries. Informational + an explicit apply-fixes decision point; never auto-edits.
  • /review + /ship adversarial: detection upgraded from install-only (command -v codex) to install and auth, with distinct not-installed vs not-authed guidance. 200-line structured-review threshold unchanged.
  • /autoplan: Phase 0.5 preflight now honors codex_reviews=disabled, so the switch is truly global.
  • Config: codex_reviews broadened to the master switch; invalid values on set are rejected (existing value preserved) so a typo can't flip paid Codex calls.
  • Shared codexPreflight() helper centralizes the read-switch + source-probe + install/auth tri-state in one bash block.

Engineering review (decisions folded in)

Plan reviewed via /plan-eng-review. Five findings, all accepted:

  • D1 disabled-asymmetry kept deliberately: disabled fully skips the plan/doc outside-voice (no Claude fallback), while diff-review keeps its free Claude adversarial subagent.
  • D2 probe sourced in the same bash block it's used in (CLAUDE.md separate-shell rule).
  • D3 doc-review reuses document-release's diff-range method, not a fresh post-merge diff.
  • D4 shared codexPreflight() helper (DRY).
  • D5 static default-on regression guards added.

Codex outside-voice (dogfooded this branch's own feature)

Ran the Codex plan review on the plan. Refinements folded: single canonical mode var, probe-functions-only-in-preflight, review touched-docs not a fixed list, bounded Claude fallback. Four cross-model tensions resolved with the user: autoplan honors the switch (T1), kept D1 asymmetry (T2), added an apply-fixes decision point to doc review (T3), reject-on-set for invalid config (T4).

Verification

  • Free suite green: bun test exit 0, parity 13/13.
  • Gate-tier E2E (the paths this branch changes), all PASS: codex-offered-ceo-review, codex-offered-eng-review, document-release, codex-review-findings.
  • Fixed 3 pre-existing stale gstack-config tests (asserted empty for unset keys; tool falls back to documented defaults). Proven pre-existing, unrelated to this feature.

Version

Requested v1.57.9.0 was already shipped to main as #1951 (gbrain source-clean render). Per the user's choice, this lands in the 1.57 line at v1.57.10.0.

🤖 Generated with Claude Code

garrytan and others added 9 commits June 9, 2026 11:20
Broaden the codex_reviews doc to describe it governing /review, /ship,
/document-release, plan reviews, and /autoplan. Reject invalid values on
set (preserving the existing value) so a typo can never silently flip
paid Codex calls on or off.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a shared codexPreflight() helper (constants.ts) that, in one bash
block, reads codex_reviews, sources gstack-codex-probe, checks install +
auth, and echoes a single canonical mode (ready/not_installed/not_authed/
disabled). All Codex resolvers route through it.

- generateCodexPlanReview: opt-in question removed; the outside voice now
  runs automatically (default-on), falling back to a Claude subagent when
  Codex is missing/unauthed. Cross-model tension still gates on user
  approval (sovereignty preserved).
- generateAdversarialStep: probe-based availability (install AND auth),
  distinct not-installed vs not-authed guidance; 200-line structured-review
  threshold unchanged.
- generateCodexDocReview (new, wired via CODEX_DOC_REVIEW): reviews the
  release's docs against the shipped diff range, informational + an explicit
  apply-fixes decision point, never auto-edits.
- autoplan Phase 0.5 now honors codex_reviews=disabled so the switch is
  truly global.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Output of gen:skill-docs for the Codex-default-on resolver/template
changes. Refreshes the factory-ship golden fixture (codex-host output
unchanged — resolvers strip for the codex host).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The codexPreflight() block + CODEX_MODE branch prose (replacing the
smaller opt-in question) grows plan-ceo/eng/devex-review and review by
5-7% over baseline. Each bump carries a comment justifying it as
intentional capability, not slop.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
skill-validation: assert plan reviews no longer carry the opt-in question
and render the default-on outside-voice, document-release carries the doc
review, and the codex host strips all of it.

gstack-config: codex_reviews defaults to enabled, accepts enabled/disabled,
and rejects an invalid value while preserving the existing one.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three tests (last touched v0.13.7.0) asserted get/list print empty for
unset keys, but gstack-config falls back to the documented defaults table
(get returns the default, list shows the active-values block). Update the
assertions to the real behavior and split out an unknown-key case that does
still return empty. Pre-existing red, unrelated to codex review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codex cross-model review now runs by default on /review, /ship, all four
plan reviews, /document-release, and /autoplan, governed by one master
switch (codex_reviews, default enabled). Plan-review outside voice is
default-on; /document-release gets a new Codex doc-vs-diff audit; every
call site detects install AND auth and falls back to a Claude subagent
with a clear reason. Disable everything with:
gstack-config set codex_reviews disabled

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@trunk-io

trunk-io Bot commented Jun 11, 2026

Copy link
Copy Markdown

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@github-actions

Copy link
Copy Markdown

E2E Evals: ✅ PASS

22/22 tests passed | $5.86 total cost | 12 parallel runners

Suite Result Status Cost
e2e-design 1/1 $0.24
e2e-plan 5/5 $2.02
e2e-qa-workflow 1/1 $0.11
e2e-review 5/5 $0.79
e2e-workflow 3/3 $0.64
llm-judge 2/2 $0.04
e2e-plan 5/5 $2.02

12x ubicloud-standard-8 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

@garrytan garrytan merged commit a5833c4 into main Jun 11, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant