Skip to content

Skip post-codex claude review when no findings#282

Merged
umputun merged 3 commits intomasterfrom
skip-post-codex-review-no-findings
Apr 15, 2026
Merged

Skip post-codex claude review when no findings#282
umputun merged 3 commits intomasterfrom
skip-post-codex-review-no-findings

Conversation

@umputun
Copy link
Copy Markdown
Owner

@umputun umputun commented Apr 15, 2026

The post-codex claude review loop catches regressions from fixes applied during external review. When external review found no actionable issues, no fixes were made — running this review wastes time and tokens.

Changes:

  • runExternalReviewLoop returns (bool, error) — bool tracks whether findings were found
  • hadFindings becomes true only when claude eval does NOT signal CodexDone (meaning fixes were applied)
  • runCodexAndPostReview skips post-codex review when hadFindings is false
  • Logs "external review found no issues, skipping post-codex claude review" when skipping

Scenarios that skip post-codex review:

  • External review tool disabled (none or CodexEnabled=false)
  • Codex returns empty output
  • Claude eval signals CodexDone on first iteration (dismissed findings, no code changes)
  • Break fires before any eval completes

Scenarios that still run post-codex review:

  • Claude eval fixes issues on any iteration (multi-iteration loops)
  • Stalemate detection terminates the loop after fixes were applied
  • Max iterations reached after fixes were applied

The post-codex claude review loop exists to catch regressions from fixes
applied during the external review phase. When the external review found
no actionable issues (CodexDone on first iteration, empty output, or
tool disabled), no fixes were made and there is nothing to regress —
running the review is wasted time and tokens.

runExternalReviewLoop now returns (bool, error) where the bool indicates
whether findings were found. runCodexAndPostReview uses this to skip the
post-codex review when unnecessary.
Copilot AI review requested due to automatic review settings April 15, 2026 06:59
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 15, 2026

Deploying ralphex with  Cloudflare Pages  Cloudflare Pages

Latest commit: b04cf62
Status: ✅  Deploy successful!
Preview URL: https://7c340570.ralphex.pages.dev
Branch Preview URL: https://skip-post-codex-review-no-fi.ralphex.pages.dev

View logs

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the processor runner’s external-review pipeline to avoid running the post-codex Claude review when the external review produced no actionable fixes, reducing unnecessary latency and token usage.

Changes:

  • Changed the external review loop APIs (runCodexLoop / runExternalReviewLoop) to return a hadFindings boolean.
  • Skips the post-codex Claude review (and proceeds directly to finalize) when hadFindings is false.
  • Updates and extends runner tests to reflect the new iteration behavior and skip conditions.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
pkg/processor/runner.go Propagates hadFindings from external review loops and conditionally skips post-codex review.
pkg/processor/runner_test.go Adjusts existing tests for multi-iteration codex loops and adds coverage for the new skip behavior.
Comments suppressed due to low confidence (2)

pkg/processor/runner.go:780

  • When the external review tool returns empty output, the loop logs "%s review returned no output, skipping..." and then breaks. That break falls through to the end-of-loop log "max %s iterations reached...", which is incorrect/misleading in this path. Consider returning immediately on empty output (or tracking an explicit early-exit reason) so the max-iterations message is only logged when the iteration budget is actually exhausted.
		if reviewResult.Output == "" {
			r.log.Print("%s review returned no output, skipping...", cfg.name)
			break
		}

pkg/processor/runner.go:815

  • In the session-timeout path the loop continues before setting hadFindings = true. Because Claude timeouts clear the error/signal, a timed-out eval could still have made working-tree changes, yet hadFindings may remain false and cause the post-codex review to be skipped. Consider conservatively marking hadFindings when a Claude eval times out (or checking git state before/after the timeout) so the safety review isn't skipped when changes may have been applied during a partial session.
		// on session timeout, skip response capture and stalemate detection; the session was killed
		// before it could finish, so partial output can't be trusted as previous context and
		// "no changes" doesn't mean "nothing to fix"
		if r.lastSessionTimedOut {
			r.log.Print("claude eval session timed out, retrying %s iteration...", cfg.name)
			continue

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/processor/runner.go
Comment on lines +382 to +387
// skip post-codex claude review when external review found nothing on the first pass.
// the purpose of this review is to catch regressions from fixes applied during the external
// review loop — if no findings were reported, no fixes were made and there's nothing to regress.
if !hadFindings {
r.log.Print("external review found no issues, skipping post-codex claude review")
return r.runFinalize(ctx)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skip-path log/comment are misleading when the external review tool is disabled (tool=="none" / CodexEnabled=false): in that case no external review ran, but this still logs "external review found no issues". Consider adjusting the condition or message to distinguish "external review disabled" vs "ran and had no actionable findings" (e.g., return a separate "ranExternalReview" flag or change the message to avoid implying it executed).

Copilot uses AI. Check for mistakes.
umputun added 2 commits April 15, 2026 02:04
- fix misleading comment on runExternalReviewLoop return value to describe
  actual mechanism (CodexDone signal check) instead of approximation
- rename test subcase to clarify intent
Copilot AI review requested due to automatic review settings April 15, 2026 07:05
@umputun umputun merged commit 87c56f7 into master Apr 15, 2026
7 of 8 checks passed
@umputun umputun deleted the skip-post-codex-review-no-findings branch April 15, 2026 07:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

pkg/processor/runner.go:780

  • When reviewResult.Output == "" the loop breaks, but then the function falls through to r.log.Print("max %s iterations reached..."), which is misleading (the loop didn’t hit max iterations; it exited due to empty output). Since empty output is treated as “no findings”, it would be clearer to return immediately here (likely with hadFindings=false) and avoid printing the max-iterations message in this path.
		if reviewResult.Output == "" {
			r.log.Print("%s review returned no output, skipping...", cfg.name)
			break
		}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2856 to +2859
// codex finds something, but claude eval says "no actionable findings" → CodexDone on first iter
claude := newMockExecutor([]executor.Result{
{Output: "dismissed all findings", Signal: status.CodexDone},
})
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test models “codex finds something but Claude dismisses all findings” by having Claude immediately emit CODEX_REVIEW_DONE. However, the default codex evaluation prompt instructs Claude not to emit CODEX_REVIEW_DONE when dismissing findings as invalid (it should provide explanations and let the external tool re-run). As written, this test may not reflect a realistic production flow and can give false confidence about the skip behavior. Consider adjusting the scenario to match the default prompt behavior (dismissal → another external iteration → eventual no-findings signal) and asserting the intended skip logic against that flow.

Copilot uses AI. Check for mistakes.
Comment thread pkg/processor/runner.go
Comment on lines 733 to +748
@@ -734,15 +745,16 @@ func (r *Runner) runExternalReviewLoop(ctx context.Context, cfg externalReviewCo
var claudeResponse string // first iteration has no prior response
var unchangedRounds int // consecutive iterations with no commits (for stalemate detection)
firstCompleted := false // tracks if any successful eval completed; controls diff scope for external tool
hadFindings := false // tracks if external review found any issues requiring fixes
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hadFindings is set to true whenever Claude does not signal CODEX_REVIEW_DONE, but the default codex evaluation prompt explicitly instructs Claude to not emit CODEX_REVIEW_DONE when it dismisses findings as invalid (and also when it fixes issues). That means hadFindings currently conflates “fixes were applied” with “findings were dismissed / need re-evaluation”, and post-codex review may still run even when no code changes were made—undercutting the goal of skipping unnecessary post-review runs. Consider tracking whether the external loop actually produced code changes (e.g., compare HEAD and/or diff fingerprint before/after eval rounds, and only run post-codex review if changes occurred), or rename/redefine the boolean to match what it truly represents and adjust the skip condition accordingly.

Copilot uses AI. Check for mistakes.
umputun added a commit that referenced this pull request Apr 16, 2026
… findings

- rename claude_model -> task_model and --claude-model -> --task-model for symmetry with review_model/--review-model
- pass task model to plan mode (was missing)
- strip duplicate --model flags from claude_args to prevent CLI errors
- fix typo in pkg/config/defaults/config: review_model section had `# claude_model =` example
- add processor test for ReviewClaude executor separation (nil fallback + distinct review executor)
- add ReviewModel merge test, normalize --flag=value form in docs
- update docs (README, CLAUDE.md, llms.txt, custom-providers.md)
- update review-claude tests for post-codex-skip behavior added in #282

picks up unmerged follow-up commits from feat/claude-model-config branch
umputun added a commit that referenced this pull request Apr 16, 2026
… findings (#284)

* feat: rename claude_model to task_model, address unmerged #272 review findings

- rename claude_model -> task_model and --claude-model -> --task-model for symmetry with review_model/--review-model
- pass task model to plan mode (was missing)
- strip duplicate --model flags from claude_args to prevent CLI errors
- fix typo in pkg/config/defaults/config: review_model section had `# claude_model =` example
- add processor test for ReviewClaude executor separation (nil fallback + distinct review executor)
- add ReviewModel merge test, normalize --flag=value form in docs
- update docs (README, CLAUDE.md, llms.txt, custom-providers.md)
- update review-claude tests for post-codex-skip behavior added in #282

picks up unmerged follow-up commits from feat/claude-model-config branch

* fix: address copilot review on PR 284

- stripFlag now handles --flag=value and bare --flag forms (was only handling --flag value space-form), prevents duplicate/orphaned --model in claude_args
- add stripFlag tests for equals form, bare flag, repeated occurrences, prefix-only false-match
- doc wording: clarify that task_model/review_model are passed as --model to the configured claude_command and may be ignored or implemented by custom wrappers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants