Skip to content

feat(commands): add santa-loop adversarial review command#1052

Merged
affaan-m merged 2 commits into
affaan-m:mainfrom
shahar-caura:feat/santa-loop-command
Mar 31, 2026
Merged

feat(commands): add santa-loop adversarial review command#1052
affaan-m merged 2 commits into
affaan-m:mainfrom
shahar-caura:feat/santa-loop-command

Conversation

@shahar-caura

@shahar-caura shahar-caura commented Mar 31, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds /santa-loop, a convergence loop command built on the santa-method skill
  • Two independent reviewers (Claude Opus + external model) must both return NICE before code ships
  • Reviewer B cascades: Codex CLI (GPT-5.4) → Gemini CLI (3.1 Pro) → Claude Agent fallback
  • NAUGHTY rounds commit fixes and re-run fresh reviewers; max 3 iterations before escalation
  • Push only happens after both reviewers return NICE

Test plan

  • Run /santa-loop with Codex CLI available — verify it spawns Claude agent + codex exec in parallel
  • Run /santa-loop without external CLIs — verify Claude-only fallback with logged warning
  • Trigger a NAUGHTY verdict — verify fix cycle commits and re-runs fresh reviewers
  • Verify max 3 iterations cap and escalation behavior
  • Verify push only occurs after NICE verdict

🤖 Generated with Claude Code


Summary by cubic

Adds /santa-loop, an adversarial dual‑review loop that blocks pushes until two independent reviewers both PASS. Reviews run in parallel (Claude Opus + external via codex gpt-5.4 or gemini gemini-2.5-pro, with Claude fallback) and auto‑fix up to three rounds.

  • New Features

    • Parallel, isolated reviews using a shared objective rubric; cascade codexgemini → Claude fallback; NAUGHTY rounds commit fixes as "fix: address santa-loop review findings (round N)"; NICE pushes and prints a final agreement report; docs added with Purpose/Usage/Output.
  • Bug Fixes

    • Clarified PASS/FAIL → NICE/NAUGHTY gate and Agent-based parallelism; safer external CLI execution (--sandbox read-only, mktemp); corrected model IDs and push command (git push -u origin HEAD); fixed gemini prompt piping; added YAML frontmatter and escalation format.

Written for commit 43f3234. Summary will update on new commits.

Summary by CodeRabbit

  • Documentation
    • Added the "Santa Loop" adversarial dual-review workflow: parallel independent reviews with objective PASS/FAIL criteria, a verdict gate that outputs NICE or NAUGHTY, structured per-criterion reporting, and a final agreement breakdown.
    • Describes iterative fix cycles (up to 3 rounds) with focused fixes and escalation to manual review if unresolved, plus push/report steps when both reviewers pass.

Adds /santa-loop, a convergence loop command built on the santa-method
skill. Two independent reviewers (Claude Opus + external model) must
both return NICE before code ships. Supports Codex CLI (GPT-5.4),
Gemini CLI (3.1 Pro), or Claude-only fallback. Fixes are committed
per round and the loop repeats until convergence or escalation.
@coderabbitai

coderabbitai Bot commented Mar 31, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2abd0921-cf27-4d88-a64b-f03219c30f12

📥 Commits

Reviewing files that changed from the base of the PR and between dbcca89 and 43f3234.

📒 Files selected for processing (1)
  • commands/santa-loop.md
✅ Files skipped from review due to trivial changes (1)
  • commands/santa-loop.md

📝 Walkthrough

Walkthrough

Adds a new command specification, "Santa Loop," describing an adversarial dual-review convergence workflow: determine review scope, generate an objective PASS/FAIL rubric (with domain-specific criteria), run two independent reviewers in parallel, gate results to NICE/NAUGHTY, perform up to 3 fix iterations, then push or escalate with a final report.

Changes

Cohort / File(s) Summary
Santa Loop Workflow Documentation
commands/santa-loop.md
New command spec introducing the "Santa Loop" adversarial dual-review workflow: scope detection, objective PASS/FAIL rubric (correctness, security, error handling, completeness, consistency, regression checks), two parallel reviewers (with external CLI fallback), verdict gating to NICE/NAUGHTY, up to 3 iterative fix cycles, merge/dedupe of critical issues, final push/report and escalation path.

Sequence Diagram(s)

sequenceDiagram
  participant Dev as Developer (scope)
  participant Santa as Santa Loop
  participant R1 as Reviewer A (Claude Opus)
  participant R2 as Reviewer B (External CLI or Claude fallback)
  participant Git as Git/Local HEAD
  participant Origin as Remote Origin

  Dev->>Santa: provide $ARGUMENTS or run `git diff --name-only HEAD`
  Santa->>Santa: generate rubric + domain criteria
  Santa->>R1: start review (parallel)
  Santa->>R2: start review (parallel)
  R1-->>Santa: JSON verdict + per-criterion results + issues
  R2-->>Santa: JSON verdict + per-criterion results + issues
  Santa->>Santa: aggregate verdict (NICE if both PASS, else NAUGHTY)
  alt NAUGHTY and iterations < 3
    Santa->>Dev: apply flagged fixes (single commit per iteration)
    Dev->>Git: commit iteration changes
    Santa->>R1: re-run fresh review
    Santa->>R2: re-run fresh review
  else BOTH PASS
    Santa->>Origin: push HEAD
    Santa-->>Dev: final report (per-reviewer verdicts, agreement, iterations)
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • affaan-m

Poem

🐰 In Santa's Loop I hop and peep,

Two reviewers guard the code I keep.
Rubrics firm, three tries to mend,
We patch, we test, and then we send.
Hooray — NICE when both agree, hip-hop, let’s leap!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(commands): add santa-loop adversarial review command' directly and clearly describes the main change: adding a new command called santa-loop with an adversarial review workflow.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps

greptile-apps Bot commented Mar 31, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds commands/santa-loop.md, a new slash command that wraps the existing skills/santa-method skill into a ready-to-run adversarial dual-review convergence loop. Two independent reviewers (Claude Opus + an external model via Codex CLI → Gemini CLI → Claude fallback) must both return PASS before code is pushed; NAUGHTY rounds apply fixes, commit them, and re-run fresh reviewers up to 3 times before escalating.

The command is well-structured and addresses several issues flagged in earlier review rounds (unique temp file via mktemp, correct git push -u origin HEAD, PASS/FAIL JSON verdict aligned with the santa-method skill). Two issues remain:

  • Invalid subagent_type: Both Reviewer A (line 67) and the Claude fallback path (line 104) specify subagent_type: \"code-reviewer\", which is not a valid Agent type in Claude Code. Valid types are general-purpose, Explore, Plan, and statusline-setup. Using an undefined type will cause the agent launch to fail, breaking the core review mechanism.
  • Gemini ARG_MAX risk: The Gemini invocation passes all file contents as a command-line argument via -p \"$(cat \"$PROMPT_FILE\")\". For review contexts with many or large files, this can exceed the OS ARG_MAX limit (~2 MB on Linux), causing a silent Argument list too long failure. The Codex path correctly uses stdin redirection and the Gemini path should follow suit if the CLI supports it.

Confidence Score: 4/5

Safe to merge after fixing the invalid subagent_type value — the core review agents won't launch with the current value.

Two P1 issues remain: the code-reviewer subagent_type will prevent reviewer agents from launching correctly, and the Gemini ARG_MAX issue can silently abort Reviewer B on large codebases. The first is a clear functional defect that should be fixed before the command is used in production workflows. All previously flagged issues have been resolved.

commands/santa-loop.md — lines 67, 99, and 104 need attention.

Important Files Changed

Filename Overview
commands/santa-loop.md New command implementing adversarial dual-review loop; two actionable issues remain: invalid subagent_type: "code-reviewer" will break agent invocation, and the Gemini -p "$(cat ...)" invocation can fail with ARG_MAX errors on large codebases.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[/santa-loop starts/] --> B[Step 1: Identify changed files\ngit diff --name-only HEAD]
    B --> C[Step 2: Build rubric\nfor file types under review]
    C --> D[Step 3: Launch reviewers in parallel]
    D --> E[Reviewer A\nAgent - general-purpose, opus]
    D --> F{Reviewer B\nCLI detection}
    F -->|codex installed| G[codex exec --sandbox read-only\n-m gpt-5.4 < PROMPT_FILE]
    F -->|gemini installed| H[gemini -m gemini-2.5-pro < PROMPT_FILE]
    F -->|neither installed| I[Agent fallback\nopus, warns about model diversity]
    E --> J[Both return PASS/FAIL JSON]
    G --> J
    H --> J
    I --> J
    J --> K{Step 4: Verdict Gate}
    K -->|Both PASS| L[NICE]
    K -->|Either FAIL| M[NAUGHTY]
    M --> N{Round <= 3?}
    N -->|Yes| O[Step 5: Fix all flagged issues\ncommit: fix: address santa-loop review findings Round N]
    O --> D
    N -->|No| P[ESCALATION\nPresent remaining issues\nDo NOT push]
    L --> Q[Step 6: git push -u origin HEAD]
    Q --> R[Step 7: Print final report]
Loading

Reviews (2): Last reviewed commit: "fix: address all PR review findings for ..." | Re-trigger Greptile

Comment thread commands/santa-loop.md Outdated
Comment thread commands/santa-loop.md Outdated
Comment thread commands/santa-loop.md Outdated
Comment thread commands/santa-loop.md Outdated

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="commands/santa-loop.md">

<violation number="1" location="commands/santa-loop.md:51">
P1: Reviewer B is invoked in autonomous `full-auto` mode inside the repo, which can mutate state during a step intended to be a pure independent review.</violation>

<violation number="2" location="commands/santa-loop.md:51">
P1: Reviewer B uses a predictable fixed file path in `/tmp`, creating avoidable clobber/race and prompt leakage risk for review content.</violation>

<violation number="3" location="commands/santa-loop.md:87">
P1: Santa-loop allows direct push after NICE verdicts without requiring automated checks/CI to pass, violating the team’s review gate expectations.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread commands/santa-loop.md Outdated
Comment thread commands/santa-loop.md Outdated
Comment thread commands/santa-loop.md Outdated
When both reviewers return NICE:

```bash
git push

@cubic-dev-ai cubic-dev-ai Bot Mar 31, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Santa-loop allows direct push after NICE verdicts without requiring automated checks/CI to pass, violating the team’s review gate expectations.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At commands/santa-loop.md, line 87:

<comment>Santa-loop allows direct push after NICE verdicts without requiring automated checks/CI to pass, violating the team’s review gate expectations.</comment>

<file context>
@@ -0,0 +1,122 @@
+When both reviewers return NICE:
+
+```bash
+git push
+```
+
</file context>
Fix with Cubic

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially addressed in 43f3234 — changed to git push -u origin HEAD. Regarding CI gates: this command pushes to a feature branch, not directly to main. CI/review gates are enforced by the PR process, not the push itself. The santa-loop's dual-reviewer verification is an additional quality layer on top of the standard PR workflow, not a replacement for it.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
commands/santa-loop.md (4)

17-30: Consider alignment with code-reviewer agent's built-in criteria.

The rubric defined here (lines 21-28) overlaps with the code-reviewer agent's existing evaluation criteria (from agents/code-reviewer.md). To avoid drift between this command's rubric and the agent's behavior:

  1. Reference the code-reviewer agent's criteria explicitly, OR
  2. Ensure this rubric is passed as an override/supplement to the agent's prompt

This ensures consistency if the code-reviewer agent's criteria are updated.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@commands/santa-loop.md` around lines 17 - 30, The rubric under "Step 2: Build
the Rubric" duplicates the code-reviewer agent's built-in criteria; update this
section to either explicitly reference the agent's criteria document
(agents/code-reviewer.md) or state that the rubric will be supplied as an
override/supplement to the agent prompt. Concretely, modify the "Step 2: Build
the Rubric" text and the rubric table to (a) add a single-line pointer: "Aligns
with code-reviewer agent criteria (see agents/code-reviewer.md)" OR (b) add a
metadata line like "rubric_override: true" and note that this rubric is an
override to be passed to the code-reviewer agent, and ensure the rubric rows
(Correctness, Security, Error handling, Completeness, Internal consistency, No
regressions) remain unchanged while adding that explicit reference/override.

80-80: Clarify escalation format when iterations are exhausted.

Line 80 states to "stop and present all remaining issues to the user" but doesn't specify the format or mechanism. The santa-method skill (from skills/santa-method/SKILL.md:199-200) uses a dedicated escalate_to_human(output, issues) function with logging.

Consider specifying:

  • Output format (prose summary, JSON, table?)
  • Output destination (stdout, file, editor?)
  • Required logging or tracking
💡 Suggested addition for escalation format
-**Maximum 3 iterations.** If still NAUGHTY after 3 rounds, stop and present all remaining issues to the user. Do NOT push.
+**Maximum 3 iterations.** If still NAUGHTY after 3 rounds, stop and present all remaining issues to the user. Do NOT push.
+
+**Escalation format:**
+```
+SANTA LOOP ESCALATION (exceeded maximum iterations)
+
+Remaining issues after 3 rounds:
+[formatted list of all critical issues from both reviewers]
+
+Manual review required before proceeding.
+```
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@commands/santa-loop.md` at line 80, Clarify and implement a concrete
escalation when the santa-loop reaches its "Maximum 3 iterations" limit: ensure
the loop that enforces the cap (santa-loop / loop controller) calls the existing
escalate_to_human(output, issues) helper from the santa-method skill when
iterations are exhausted, and define the output as a short prose summary plus a
machine-readable JSON payload of remaining issues (issues array with id,
severity, reviewer, and short_description); send the prose to stdout and write
the JSON to a timestamped file for audit, and emit a structured log entry via
the same logging facility (include action="escalate", method="santa-loop",
iteration_count=3, file_path) so tracking and downstream tooling can pick it up.

32-34: Consider specifying parallel execution mechanism.

Line 34 emphasizes parallel execution but doesn't specify the implementation. Depending on the execution context (shell script, agent orchestration, etc.), the mechanism differs.

Consider adding a note about how to achieve parallelism, e.g.:

  • Shell: Use background processes with & and wait
  • Agent system: Spawn concurrent tasks
  • Ensure both complete before proceeding to verdict gate
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@commands/santa-loop.md` around lines 32 - 34, The Step 3 description "Launch
two reviewers in parallel" is ambiguous—update the "Dual Independent Review"
section to specify how to run reviewers concurrently and ensure completion
before the verdict gate: explicitly mention example mechanisms such as using
background processes and wait in a shell script, spawning concurrent tasks or
threads in an agent/orchestration system, or using async/promises with join/wait
in code; state that both reviewer processes must be awaited/joined before
proceeding to the verdict gate (reference the Step 3 header "Dual Independent
Review", the phrase "Launch two reviewers in parallel", and the "verdict gate")
so implementers know to both start concurrently and block until both finish.

86-88: Consider safer push semantics.

The simple git push assumes the current branch tracks a remote and is up-to-date. For a workflow emphasizing quality gates, consider:

# Ensure we're up-to-date before pushing
git fetch origin
if git merge-base --is-ancestor origin/$(git rev-parse --abbrev-ref HEAD) HEAD; then
  git push
else
  echo "ERROR: Remote branch has diverged. Pull or rebase first."
  exit 1
fi

Or use --force-with-lease if rebasing is part of the fix cycle.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@commands/santa-loop.md` around lines 86 - 88, Replace the simple git push
with a safer push sequence: run git fetch origin, check divergence with git
merge-base --is-ancestor combined with git rev-parse --abbrev-ref HEAD to ensure
the remote branch is an ancestor of your HEAD, and only then run git push; if it
has diverged print an error and exit instead of pushing, or if your workflow
rebases use git push --force-with-lease to avoid accidentally clobbering remote
changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@commands/santa-loop.md`:
- Around line 1-3: The Markdown file "santa-loop.md" is missing required YAML
frontmatter; add a YAML frontmatter block at the top of the file (between ---
markers) containing at minimum a descriptive "description" field (and any other
required metadata per project guidelines) so the command file is recognized as a
command; ensure the frontmatter precedes the existing content ("# Santa
Loop...") and is valid YAML.
- Around line 38-42: The code-reviewer agent's native verdict
("Approve"/"Warning"/"Block") is incompatible with the Verdict Gate and
santa_verdict() which expect PASS/FAIL per criterion aggregated into
NICE/NAUGHTY; fix by either (A) overriding the code-reviewer prompt (when
launching the agent in the santa-loop) to require structured PASS/FAIL per
rubric criterion (or explicit NICE/NAUGHTY final label) so output matches
skills/santa-method expectations, or (B) adding a small translation layer after
the code-reviewer output that maps its native labels to the expected format
(e.g., Approve -> PASS/NICE, Warning -> FAIL/NAUGHTY, Block -> FAIL/NAUGHTY)
and/or converts per-criterion approvals into PASS/FAIL before calling
santa_verdict(); update the santa-loop launch logic where the code-reviewer
agent is created and the decision path through the Verdict Gate to use the
chosen approach.

---

Nitpick comments:
In `@commands/santa-loop.md`:
- Around line 17-30: The rubric under "Step 2: Build the Rubric" duplicates the
code-reviewer agent's built-in criteria; update this section to either
explicitly reference the agent's criteria document (agents/code-reviewer.md) or
state that the rubric will be supplied as an override/supplement to the agent
prompt. Concretely, modify the "Step 2: Build the Rubric" text and the rubric
table to (a) add a single-line pointer: "Aligns with code-reviewer agent
criteria (see agents/code-reviewer.md)" OR (b) add a metadata line like
"rubric_override: true" and note that this rubric is an override to be passed to
the code-reviewer agent, and ensure the rubric rows (Correctness, Security,
Error handling, Completeness, Internal consistency, No regressions) remain
unchanged while adding that explicit reference/override.
- Line 80: Clarify and implement a concrete escalation when the santa-loop
reaches its "Maximum 3 iterations" limit: ensure the loop that enforces the cap
(santa-loop / loop controller) calls the existing escalate_to_human(output,
issues) helper from the santa-method skill when iterations are exhausted, and
define the output as a short prose summary plus a machine-readable JSON payload
of remaining issues (issues array with id, severity, reviewer, and
short_description); send the prose to stdout and write the JSON to a timestamped
file for audit, and emit a structured log entry via the same logging facility
(include action="escalate", method="santa-loop", iteration_count=3, file_path)
so tracking and downstream tooling can pick it up.
- Around line 32-34: The Step 3 description "Launch two reviewers in parallel"
is ambiguous—update the "Dual Independent Review" section to specify how to run
reviewers concurrently and ensure completion before the verdict gate: explicitly
mention example mechanisms such as using background processes and wait in a
shell script, spawning concurrent tasks or threads in an agent/orchestration
system, or using async/promises with join/wait in code; state that both reviewer
processes must be awaited/joined before proceeding to the verdict gate
(reference the Step 3 header "Dual Independent Review", the phrase "Launch two
reviewers in parallel", and the "verdict gate") so implementers know to both
start concurrently and block until both finish.
- Around line 86-88: Replace the simple git push with a safer push sequence: run
git fetch origin, check divergence with git merge-base --is-ancestor combined
with git rev-parse --abbrev-ref HEAD to ensure the remote branch is an ancestor
of your HEAD, and only then run git push; if it has diverged print an error and
exit instead of pushing, or if your workflow rebases use git push
--force-with-lease to avoid accidentally clobbering remote changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2c4681ce-f56a-4b42-9745-408fd9de8407

📥 Commits

Reviewing files that changed from the base of the PR and between f7f91d9 and dbcca89.

📒 Files selected for processing (1)
  • commands/santa-loop.md

Comment thread commands/santa-loop.md
Comment thread commands/santa-loop.md Outdated
- Add YAML frontmatter with description (coderabbit)
- Add Purpose, Usage, Output sections per CONTRIBUTING.md template (coderabbit)
- Fix literal <prompt> placeholder in Gemini CLI invocation (greptile P1)
- Use mktemp for unique temp file instead of fixed /tmp path (greptile P1, cubic P1)
- Use --sandbox read-only instead of --full-auto to prevent repo mutation (cubic P1)
- Use git push -u origin HEAD instead of bare git push (greptile P2, cubic P1)
- Clarify verdict protocol: reviewers return PASS/FAIL, gate maps to NICE/NAUGHTY (greptile P2, coderabbit)
- Specify parallel execution mechanism via Agent tool (coderabbit nitpick)
- Add escalation format for max-iterations case (coderabbit nitpick)
- Fix model IDs: gpt-5.4 for Codex, gemini-2.5-pro for Gemini
Comment thread commands/santa-loop.md

#### Reviewer A: Claude Agent (always runs)

Launch an Agent (subagent_type: `code-reviewer`, model: `opus`) with the full rubric + all files under review. The prompt must include:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 code-reviewer is not a valid subagent_type

The Agent tool only accepts a fixed set of subagent_type values: general-purpose, statusline-setup, Explore, and Plan. Specifying code-reviewer will cause the agent invocation to fail or silently fall back, meaning the review never actually runs — defeating the whole purpose of the command.

The same issue appears at line 104 for the Claude fallback path.

The correct type is general-purpose, with the reviewer role communicated entirely through the prompt text:

Suggested change
Launch an Agent (subagent_type: `code-reviewer`, model: `opus`) with the full rubric + all files under review. The prompt must include:
Launch an Agent (subagent_type: `general-purpose`, model: `opus`) with the full rubric + all files under review. The prompt must include:

And at line 104:

Launch a second Claude Agent (subagent_type: `general-purpose`, model: `opus`). Log a warning...

Comment thread commands/santa-loop.md
Comment on lines +99 to +100
gemini -p "$(cat "$PROMPT_FILE")" -m gemini-2.5-pro
rm -f "$PROMPT_FILE"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Gemini invocation can exceed ARG_MAX for large review contexts

-p "$(cat "$PROMPT_FILE")" passes all file contents as a command-line argument string. On Linux, ARG_MAX is typically ~2 MB. A review covering multiple files (e.g. a PR with 10+ code files) can easily exceed this limit, causing the shell to error with Argument list too long and silently aborting Reviewer B.

The Codex path correctly uses stdin redirection (< "$PROMPT_FILE") to avoid this. If the Gemini CLI supports stdin (e.g. gemini -m gemini-2.5-pro < "$PROMPT_FILE"), that form should be used instead:

Suggested change
gemini -p "$(cat "$PROMPT_FILE")" -m gemini-2.5-pro
rm -f "$PROMPT_FILE"
gemini -m gemini-2.5-pro < "$PROMPT_FILE"
rm -f "$PROMPT_FILE"

If the Gemini CLI does not support stdin piping, this limitation should be documented explicitly so users know large review contexts may require a different approach.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="commands/santa-loop.md">

<violation number="1" location="commands/santa-loop.md:83">
P2: Temp prompt file cleanup is not failure-safe; if the CLI invocation is interrupted, the file with full source contents can remain in /tmp. Add an EXIT trap to ensure cleanup even on failure.</violation>

<violation number="2" location="commands/santa-loop.md:99">
P2: Avoid passing the full review payload through `-p "$(cat ...)"`; large prompts can exceed OS argument-length limits and cause this reviewer step to fail. Use stdin or a prompt-file mechanism so large PR contexts remain supported.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

Comment thread commands/santa-loop.md

Build the reviewer prompt (identical rubric + instructions as Reviewer A) and write it to a unique temp file:
```bash
PROMPT_FILE=$(mktemp /tmp/santa-reviewer-b-XXXXXX.txt)

@cubic-dev-ai cubic-dev-ai Bot Mar 31, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Temp prompt file cleanup is not failure-safe; if the CLI invocation is interrupted, the file with full source contents can remain in /tmp. Add an EXIT trap to ensure cleanup even on failure.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At commands/santa-loop.md, line 83:

<comment>Temp prompt file cleanup is not failure-safe; if the CLI invocation is interrupted, the file with full source contents can remain in /tmp. Add an EXIT trap to ensure cleanup even on failure.</comment>

<file context>
@@ -31,40 +45,70 @@ Add domain-specific criteria based on file types (e.g., type safety for TS, memo
+Build the reviewer prompt (identical rubric + instructions as Reviewer A) and write it to a unique temp file:
 ```bash
-codex exec -p "full-auto" --model gpt-5.4 -C "$(pwd)" - < /tmp/santa-reviewer-b-prompt.txt
+PROMPT_FILE=$(mktemp /tmp/santa-reviewer-b-XXXXXX.txt)
+cat > "$PROMPT_FILE" << 'EOF'
+... full rubric + file contents + reviewer instructions ...
</file context>
Suggested change
PROMPT_FILE=$(mktemp /tmp/santa-reviewer-b-XXXXXX.txt)
PROMPT_FILE=$(mktemp /tmp/santa-reviewer-b-XXXXXX.txt)
trap 'rm -f "$PROMPT_FILE"' EXIT
Fix with Cubic

Comment thread commands/santa-loop.md

**Gemini CLI** (if installed and codex is not)
```bash
gemini -p "$(cat "$PROMPT_FILE")" -m gemini-2.5-pro

@cubic-dev-ai cubic-dev-ai Bot Mar 31, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Avoid passing the full review payload through -p "$(cat ...)"; large prompts can exceed OS argument-length limits and cause this reviewer step to fail. Use stdin or a prompt-file mechanism so large PR contexts remain supported.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At commands/santa-loop.md, line 99:

<comment>Avoid passing the full review payload through `-p "$(cat ...)"`; large prompts can exceed OS argument-length limits and cause this reviewer step to fail. Use stdin or a prompt-file mechanism so large PR contexts remain supported.</comment>

<file context>
@@ -31,40 +45,70 @@ Add domain-specific criteria based on file types (e.g., type safety for TS, memo
-If neither external CLI is available, launch a second Claude Agent (subagent_type: `code-reviewer`, model: `opus`). Log a note that both reviewers share the same model family — true model diversity was not achieved but context isolation is still enforced.
+**Gemini CLI** (if installed and codex is not)
+```bash
+gemini -p "$(cat "$PROMPT_FILE")" -m gemini-2.5-pro
+rm -f "$PROMPT_FILE"
+```
</file context>
Fix with Cubic

@affaan-m affaan-m merged commit 99a44f6 into affaan-m:main Mar 31, 2026
4 checks passed
peiking88 pushed a commit to peiking88/everything-claude-code that referenced this pull request Apr 4, 2026
)

* feat(commands): add santa-loop adversarial review command

Adds /santa-loop, a convergence loop command built on the santa-method
skill. Two independent reviewers (Claude Opus + external model) must
both return NICE before code ships. Supports Codex CLI (GPT-5.4),
Gemini CLI (3.1 Pro), or Claude-only fallback. Fixes are committed
per round and the loop repeats until convergence or escalation.

* fix: address all PR review findings for santa-loop command

- Add YAML frontmatter with description (coderabbit)
- Add Purpose, Usage, Output sections per CONTRIBUTING.md template (coderabbit)
- Fix literal <prompt> placeholder in Gemini CLI invocation (greptile P1)
- Use mktemp for unique temp file instead of fixed /tmp path (greptile P1, cubic P1)
- Use --sandbox read-only instead of --full-auto to prevent repo mutation (cubic P1)
- Use git push -u origin HEAD instead of bare git push (greptile P2, cubic P1)
- Clarify verdict protocol: reviewers return PASS/FAIL, gate maps to NICE/NAUGHTY (greptile P2, coderabbit)
- Specify parallel execution mechanism via Agent tool (coderabbit nitpick)
- Add escalation format for max-iterations case (coderabbit nitpick)
- Fix model IDs: gpt-5.4 for Codex, gemini-2.5-pro for Gemini
FrancescoRosciano pushed a commit to FRosciano-Mambo/everything-claude-code that referenced this pull request Jun 1, 2026
)

* feat(commands): add santa-loop adversarial review command

Adds /santa-loop, a convergence loop command built on the santa-method
skill. Two independent reviewers (Claude Opus + external model) must
both return NICE before code ships. Supports Codex CLI (GPT-5.4),
Gemini CLI (3.1 Pro), or Claude-only fallback. Fixes are committed
per round and the loop repeats until convergence or escalation.

* fix: address all PR review findings for santa-loop command

- Add YAML frontmatter with description (coderabbit)
- Add Purpose, Usage, Output sections per CONTRIBUTING.md template (coderabbit)
- Fix literal <prompt> placeholder in Gemini CLI invocation (greptile P1)
- Use mktemp for unique temp file instead of fixed /tmp path (greptile P1, cubic P1)
- Use --sandbox read-only instead of --full-auto to prevent repo mutation (cubic P1)
- Use git push -u origin HEAD instead of bare git push (greptile P2, cubic P1)
- Clarify verdict protocol: reviewers return PASS/FAIL, gate maps to NICE/NAUGHTY (greptile P2, coderabbit)
- Specify parallel execution mechanism via Agent tool (coderabbit nitpick)
- Add escalation format for max-iterations case (coderabbit nitpick)
- Fix model IDs: gpt-5.4 for Codex, gemini-2.5-pro for Gemini
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants