Skip to content

feat(skills): add autonomous PR comparator for issues with competing fixes#3052

Merged
cv merged 6 commits into
mainfrom
feat/skill-pr-comparator
May 5, 2026
Merged

feat(skills): add autonomous PR comparator for issues with competing fixes#3052
cv merged 6 commits into
mainfrom
feat/skill-pr-comparator

Conversation

@cjagwani

@cjagwani cjagwani commented May 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds nemoclaw-maintainer-pr-comparator: an autonomous skill that picks the merge winner among competing PRs targeting the same issue. Runs Tier 0 plumbing gates → Tier 1 correctness checks → Tier 2 quality checks → Tier 3 deterministic ranking. Degraded mode handles the case where no PR passes gates.

Why

When two contributors fix the same issue, deciding which to merge is judgment-heavy. This skill makes that judgment deterministic, explainable, and reusable across repos.

What's clever (catches what CI can't)

  • Reads each new test against the pre-fix code to verify the test would have failed (catches smoke-test fakes)
  • Parses issue body and comments for acceptance criteria (catches "and don't break Y while you're at it")
  • Refactor-vs-behavior scan flags hidden behavior changes inside what looks like a rename
  • Mocking-purity check catches tests that mock the unit under test
  • Public-surface preservation flags content changes (not moves) in flags/help/errors
  • Workaround-vs-root-cause check flags symptom-suppressing try/catch fixes without follow-up
  • Behavior-coverage matrix recommends "merge A, cherry-pick B's tests for criterion X" when no PR dominates
  • Degraded mode picks closest-to-ready when neither passes gates instead of giving up

Validation

Backtested against 5 historical NemoClaw cases (#2681, #2947, #893, #2636, refactor chain #2087/#2489/#2495). Initial v1: 4/5. After patches (PR-state-OPEN gate, supersession detection, workaround-vs-root-cause flag): 5/5.

Skill structure (per Claude best-practices)

Slim SKILL.md (orchestration), tier checks in one-level-deep reference files, output template extracted, five utility scripts for the deterministic work, repo-specific assumptions in repo-policy.md for cross-repo reuse.

.agents/skills/nemoclaw-maintainer-pr-comparator/
├── SKILL.md                        # orchestration only
├── repo-policy.md                  # configurable per-repo defaults
├── tiebreakers.md                  # Tier 3 + degraded mode
├── checks/
│   ├── tier-0-gates.md
│   ├── tier-1-correctness.md
│   └── tier-2-quality.md
├── templates/verdict.md
├── validation/backtest.md
└── scripts/
    ├── find-candidates.sh
    ├── collect-gates.sh
    ├── check-coderabbit-threads.sh   # GraphQL on reviewThreads.isResolved
    ├── parse-supersession.sh
    └── render-verdict.py

Test plan

  • markdownlint clean
  • shellcheck clean
  • python compiles
  • Backtest 5/5 on historical cases
  • Try the skill on a live duplicate-PR situation when one arises

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a user-invocable PR comparison skill to evaluate competing PRs, run multi-tier gates, score/rank candidates, and output a recommended merge verdict with evidence and trace.
    • Added CLI helpers for candidate discovery, gate collection, supersession detection, thread checks, and verdict rendering.
  • Documentation

    • New repo policy, tiered-check rubrics (Tier 0–3), verdict template, and backtest/validation guides for ongoing tuning.

cjagwani added 2 commits May 5, 2026 08:47
…cisions

Adds nemoclaw-maintainer-pr-comparator: an autonomous skill that picks
the best PR among multiple candidates targeting the same issue. Runs a
3-tier check pipeline (plumbing gates, correctness, code quality) plus
deterministic comparative scoring with happy-path and degraded modes.

Validated against 5 historical NemoClaw cases (#2681, #2947, #893,
#2636, refactor chain #2087/#2489/#2495); spec patches added for
PR-state-OPEN gate, supersession detection, and workaround-vs-root-cause
flag based on validation findings.
…r-comparator

Slim SKILL.md (orchestration only); tier checks moved to one-level-deep
reference files; output template, repo policy, and validation guidance
extracted; five utility scripts added for the deterministic work that
previously required Claude to inspect raw gh JSON.

Substance changes per Claude best-practices review:
- Description rewritten in third person, scope no longer overpromises
- CodeRabbit thread check uses GraphQL on reviewThreads.isResolved
  (REST /comments lacks resolution state)
- Candidate discovery follows a single default order with explicit
  stop conditions instead of "plus optionally" expansions
- Author-merge-ratio tiebreaker dropped (process-noisy)
- "Earlier PR" demoted to final deterministic fallback
- Repo-specific assumptions pulled into repo-policy.md
@copy-pr-bot

copy-pr-bot Bot commented May 5, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

A new nemoclaw-maintainer-pr-comparator skill is introduced to evaluate and rank competing open PRs targeting the same issue. The skill implements a tiered model: mandatory Tier 0 plumbing gates, LLM-based Tier 1 correctness checks and Tier 2 quality checks with weighted scoring (Tier 1 = 2.0×, Tier 2 = 1.0×), and Tier 3 tiebreakers with a degraded-mode fallback. It includes scripts for candidate discovery, gate collection, supersession parsing, verdict rendering, repository policy configuration, and backtest validation.

Changes

PR Comparator Skill System

Layer / File(s) Summary
Skill Definition & Workflow
.agents/skills/nemoclaw-maintainer-pr-comparator/SKILL.md
Adds skill frontmatter and a nine-step workflow from acceptance-criteria extraction through candidate discovery, supersession detection, Tier 0 gating, Tier 1/2 LLM checks, Tier 3 ranking, and verdict emission. Lists prerequisites, reference files, executed scripts, and deferred v2 capabilities.
Tier 0 Plumbing Gates (data collection & spec)
.agents/skills/nemoclaw-maintainer-pr-comparator/checks/tier-0-gates.md, .../scripts/collect-gates.sh, .../scripts/check-coderabbit-threads.sh
Documents five mandatory gates (open state, latest-head CI green, mergeable/clean, branch-protection-approved, automated-reviewer threads resolved). Implements collect-gates.sh to emit gate booleans and failure tags and check-coderabbit-threads.sh to query GraphQL for bot thread resolution and produce JSON gate output.
Tier 1 Correctness (LLM judgment spec)
.agents/skills/nemoclaw-maintainer-pr-comparator/checks/tier-1-correctness.md
Adds six correctness criteria (pre-fix failing tests, comment-as-spec mapping, negative/edge-case tests, coverage shape per-branch, refactor-vs-behavior scan, mocking purity) with pass/yellow/fail rules and required file/line evidence capture.
Tier 2 Quality (LLM judgment spec)
.agents/skills/nemoclaw-maintainer-pr-comparator/checks/tier-2-quality.md
Adds four quality checks (changes description coverage, migration completeness, public-surface preservation with docs/Notes requirements, workaround/suppression detection) with pass/yellow/fail semantics and scoring weight.
Repository Policy Configuration
.agents/skills/nemoclaw-maintainer-pr-comparator/repo-policy.md
New repo-level YAML-configurable defaults (CODEOWNERS enforcement via branch protection, DCO check name, automated reviewer bot list and thread gating, docs directory for Tier 2 checks, coverage ratchet flags, candidate discovery tuning parameters, excluded bot authors).
Candidate Discovery & Supersession Parsing
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/find-candidates.sh, .../scripts/parse-supersession.sh
find-candidates.sh discovers up to 10 candidate PRs via staged expansion: explicit issue refs in bodies → filename-in-files search → title-token Jaccard similarity. parse-supersession.sh extracts superseder→superseded edges from PR body patterns and emits JSON edges restricted to provided candidates.
Verdict Rendering & Template
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py, .../templates/verdict.md
Adds render-verdict.py (Tier weights constants, helpers to map statuses to scores/emojis, functions to render scorecard, optional behavior-coverage matrix, and evidence sections) and a verdict.md template specifying required evidence format, acceptance-criteria checklist, per-PR scorecard, and degraded-mode messaging.
Tier 3 Ranking & Tiebreakers
.agents/skills/nemoclaw-maintainer-pr-comparator/tiebreakers.md
Adds ranking logic: happy mode (weighted Tier 1–2 scoring, behavior-coverage matrix, ordered tiebreakers — supersession, smaller diff, negative test coverage, recency, deterministic earliest) and degraded mode (rank by fewer substantive then trivial Tier 0 failures, then weighted score; produce salvage steps and reports).
Validation / Backtest Guide
.agents/skills/nemoclaw-maintainer-pr-comparator/validation/backtest.md
Adds backtest harness guidance for historical cases, test-loop procedure, classification of outcomes, target error rates, remediation steps, and documented failure modes with mitigations.

Sequence Diagram

sequenceDiagram
    actor User
    participant Skill as PR Comparator<br/>Skill
    participant GitHub as GitHub API
    participant Scripts as Collection<br/>Scripts
    participant LLM as LLM<br/>Evaluator
    participant Renderer as Verdict<br/>Renderer

    User->>Skill: invoke(issue_number)
    Skill->>GitHub: fetch issue (acceptance criteria + comments)
    GitHub-->>Skill: issue body + comments
    Skill->>Scripts: find-candidates.sh (issue_number)
    Scripts->>GitHub: search PRs by body/files/title
    GitHub-->>Scripts: candidate PR numbers
    Scripts-->>Skill: [PR#...]
    Skill->>Scripts: parse-supersession.sh (PR#...)
    Scripts->>GitHub: fetch PR bodies
    GitHub-->>Scripts: PR body text
    Scripts-->>Skill: supersession edges
    Skill->>Scripts: collect-gates.sh (PR#n)
    Scripts->>GitHub: gh pr view (state, CI, mergeable, reviewDecision)
    GitHub-->>Scripts: gate data
    Scripts-->>Skill: gate results per PR
    Skill->>Scripts: check-coderabbit-threads.sh (PR#n)
    Scripts->>GitHub: GraphQL query (reviewThreads)
    GitHub-->>Scripts: thread resolution state
    Scripts-->>Skill: thread gate status
    Skill->>LLM: evaluate Tier 1 (diffs, tests, issue criteria)
    LLM-->>Skill: correctness scores + evidence
    Skill->>LLM: evaluate Tier 2 (description, docs, migrations)
    LLM-->>Skill: quality scores + evidence
    Skill->>Skill: Tier 3: rank via tiebreakers or degraded mode
    Skill->>Renderer: render-verdict.py (comparison JSON)
    Renderer-->>Skill: Markdown verdict
    Skill-->>User: verdict + scorecard + reasoning + evidence
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Hops through PR meadows with whiskers held high,
Five gates, two tiers, and a tiebreaker tie,
Scripts sniff the trails, LLMs lend their sight,
A verdict in Markdown—reasoned, scored, and bright!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the main change: introducing an autonomous PR comparator skill for issues with competing fixes. The title directly reflects the primary functionality added across all the files in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/skill-pr-comparator

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (3)
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/parse-supersession.sh (1)

24-26: 💤 Low value

Missing validation for --repo argument value.

If --repo is passed as the last argument without a value (e.g., parse-supersession.sh 123 --repo), the script will fail with an unbound variable error on $2 due to set -u, or shift 2 will fail if there's only one argument left. Adding validation improves error messaging.

♻️ Proposed defensive check
     --repo)
+      if [ -z "${2:-}" ]; then
+        echo "Error: --repo requires OWNER/REPO argument" >&2
+        exit 64
+      fi
       repo_args=(--repo "$2")
       shift 2
       ;;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
@.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/parse-supersession.sh
around lines 24 - 26, The handler for the --repo option in parse-supersession.sh
uses "$2" and shift 2 without validating that a value exists; update the --repo
case to check that a non-empty argument is present (e.g., verify $# -ge 2 or
that "${2:-}" is non-empty) before assigning repo_args=(--repo "$2") and
performing shift 2, and if the check fails print a clear error/usage message and
exit with non-zero status so the script doesn't hit an unbound-variable or
broken shift.
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/check-coderabbit-threads.sh (1)

98-105: 💤 Low value

Consider JSON-escaping $bot_login in heredoc output.

If $bot_login contains quotes or backslashes (unlikely but possible with custom bot names), the JSON output would be malformed. Using jq for the final output would ensure valid JSON.

♻️ Safer JSON construction with jq
-cat <<JSON
-{
-  "pr": $pr,
-  "bot_login": "$bot_login",
-  "gate_coderabbit_threads_resolved": $gate_pass,
-  "details": $counts
-}
-JSON
+jq -n \
+  --argjson pr "$pr" \
+  --arg bot "$bot_login" \
+  --argjson gate "$gate_pass" \
+  --argjson details "$counts" \
+  '{pr: $pr, bot_login: $bot, gate_coderabbit_threads_resolved: $gate, details: $details}'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
@.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/check-coderabbit-threads.sh
around lines 98 - 105, The heredoc that emits JSON uses the unescaped variable
$bot_login which can break JSON if it contains quotes/backslashes; update the
script (check-coderabbit-threads.sh) to build the JSON using a safe tool like jq
or to explicitly JSON-escape $bot_login before emitting (e.g., construct the
object with jq --arg pr "$pr" --arg bot_login "$bot_login" --argjson
gate_coderabbit_threads_resolved "$gate_pass" --argjson details "$counts" '.pr =
($pr|tonumber?) | .bot_login = $bot_login | .gate_coderabbit_threads_resolved =
$gate_coderabbit_threads_resolved | .details = $details' or equivalent) so the
output is always valid JSON and references the same variables ($pr, $bot_login,
$gate_pass, $counts).
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py (1)

181-186: ⚡ Quick win

Add defensive handling for missing required keys.

spec["issue"] will raise KeyError if the key is missing, producing a cryptic traceback. Consider validating required keys upfront or using .get() with appropriate defaults/error messages.

♻️ Proposed validation
+    required_keys = ["issue", "prs"]
+    for key in required_keys:
+        if key not in spec:
+            print(f"Missing required key in spec: {key}", file=sys.stderr)
+            return 64
+
     issue = spec["issue"]
     criteria = spec.get("criteria", [])
     prs = spec.get("prs", [])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py
around lines 181 - 186, spec["issue"] can raise a KeyError if the input dict
lacks required keys; update the code that reads spec (the block assigning issue,
criteria, prs, winner, mode, tiebreaker) to validate required keys up front
(e.g., check for "issue" and any other required fields) and either use
spec.get(...) with sensible defaults for optional fields (criteria, prs, winner,
mode, tiebreaker) or raise a clear ValueError/TypeError with a descriptive
message when a required key is missing so callers get an actionable error
instead of a cryptic traceback.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/collect-gates.sh:
- Around line 38-39: The jq expressions computing ci_failure_count and
ci_pending_count fail when .statusCheckRollup is null; update both queries to
null-coalesce .statusCheckRollup to an empty array (e.g. use .statusCheckRollup
// []) so the select/iteration always runs on an array; modify the expressions
that set ci_failure_count and ci_pending_count to use (.statusCheckRollup // [])
in place of .statusCheckRollup.

In @.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/find-candidates.sh:
- Around line 36-38: The gh CLI search invocation assigned to the candidates
variable uses an invalid flag (--in:body); move the body qualifier into the
query string so gh search prs receives the full query (e.g., include "in:body
#${issue_number}" with repo_args), retry the search without suppressing stderr
while preserving piping to sort/head, and ensure the command still outputs JSON
numbers for the jq filter; update the command that builds candidates (using gh
search prs, repo_args, issue_number, MAX_CANDIDATES) accordingly so it actually
finds PRs containing the issue number in the body.

In @.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py:
- Around line 113-118: tier_2_keys currently contains "workaround_vs_root_cause"
but there is no corresponding check in checks/tier-2-quality.md; fix by either
removing "workaround_vs_root_cause" from the tier_2_keys list in
render-verdict.py (so the code and docs align) or add a formal definition and
scoring for the workaround_vs_root_cause check in checks/tier-2-quality.md (and
update any max-score calculations/templates accordingly) — locate the symbol
tier_2_keys in
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py and
make the change consistently (removal from the list OR a documented check entry
and score in tier-2-quality.md).
- Around line 205-209: The current prints use the variable winner directly and
produce confusing output when winner is None; update the code that handles mode
and winner (variables mode and winner) to validate winner first and substitute a
clear fallback (e.g., "no PR selected" or "no winner") before formatting
messages, so in the "happy" branch you print "VERDICT: MERGE PR #<id>" only when
winner is not None and otherwise print "VERDICT: no PR selected", and in the
degraded branch print "Neither mergeable yet" and "No PR is closer" (or similar)
when winner is None; ensure all f-strings using winner are guarded or replaced
with the fallback string.

In @.agents/skills/nemoclaw-maintainer-pr-comparator/templates/verdict.md:
- Line 34: The example in the verdict template shows an incorrect max score
(21.0); update the "Weighted score" row in
.agents/skills/nemoclaw-maintainer-pr-comparator/templates/verdict.md to reflect
the true maximum based on actual check counts and weights (Tier 1 = 2.0× per
check, Tier 2 = 1.0× per check). Reconcile counts with render-verdict.py (which
currently includes the Tier‑2 check workaround_vs_root_cause) and compute the
correct max (e.g., 6 Tier‑1 checks ×2 + 4 Tier‑2 checks ×1 = 16.0, or adjust if
checks differ), then replace both the numerator and denominator values so the
example matches the code.

In @.agents/skills/nemoclaw-maintainer-pr-comparator/validation/backtest.md:
- Around line 22-24: The gh search prs command uses search qualifiers inside the
query string, not as flags; replace the incorrect snippet gh search prs --repo
OWNER/REPO --merged --in:body "supersedes" --limit 30 by moving the qualifier
into the quoted query, e.g. gh search prs --repo OWNER/REPO --merged "in:body
supersedes" --limit 30 (or include repo/is:merged qualifiers inside the query as
needed) so the in:body qualifier is part of the query string.

---

Nitpick comments:
In
@.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/check-coderabbit-threads.sh:
- Around line 98-105: The heredoc that emits JSON uses the unescaped variable
$bot_login which can break JSON if it contains quotes/backslashes; update the
script (check-coderabbit-threads.sh) to build the JSON using a safe tool like jq
or to explicitly JSON-escape $bot_login before emitting (e.g., construct the
object with jq --arg pr "$pr" --arg bot_login "$bot_login" --argjson
gate_coderabbit_threads_resolved "$gate_pass" --argjson details "$counts" '.pr =
($pr|tonumber?) | .bot_login = $bot_login | .gate_coderabbit_threads_resolved =
$gate_coderabbit_threads_resolved | .details = $details' or equivalent) so the
output is always valid JSON and references the same variables ($pr, $bot_login,
$gate_pass, $counts).

In
@.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/parse-supersession.sh:
- Around line 24-26: The handler for the --repo option in parse-supersession.sh
uses "$2" and shift 2 without validating that a value exists; update the --repo
case to check that a non-empty argument is present (e.g., verify $# -ge 2 or
that "${2:-}" is non-empty) before assigning repo_args=(--repo "$2") and
performing shift 2, and if the check fails print a clear error/usage message and
exit with non-zero status so the script doesn't hit an unbound-variable or
broken shift.

In @.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py:
- Around line 181-186: spec["issue"] can raise a KeyError if the input dict
lacks required keys; update the code that reads spec (the block assigning issue,
criteria, prs, winner, mode, tiebreaker) to validate required keys up front
(e.g., check for "issue" and any other required fields) and either use
spec.get(...) with sensible defaults for optional fields (criteria, prs, winner,
mode, tiebreaker) or raise a clear ValueError/TypeError with a descriptive
message when a required key is missing so callers get an actionable error
instead of a cryptic traceback.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7f7e4796-b974-4736-bfdb-2899699a803c

📥 Commits

Reviewing files that changed from the base of the PR and between c398ad9 and 65cbafd.

📒 Files selected for processing (13)
  • .agents/skills/nemoclaw-maintainer-pr-comparator/SKILL.md
  • .agents/skills/nemoclaw-maintainer-pr-comparator/checks/tier-0-gates.md
  • .agents/skills/nemoclaw-maintainer-pr-comparator/checks/tier-1-correctness.md
  • .agents/skills/nemoclaw-maintainer-pr-comparator/checks/tier-2-quality.md
  • .agents/skills/nemoclaw-maintainer-pr-comparator/repo-policy.md
  • .agents/skills/nemoclaw-maintainer-pr-comparator/scripts/check-coderabbit-threads.sh
  • .agents/skills/nemoclaw-maintainer-pr-comparator/scripts/collect-gates.sh
  • .agents/skills/nemoclaw-maintainer-pr-comparator/scripts/find-candidates.sh
  • .agents/skills/nemoclaw-maintainer-pr-comparator/scripts/parse-supersession.sh
  • .agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py
  • .agents/skills/nemoclaw-maintainer-pr-comparator/templates/verdict.md
  • .agents/skills/nemoclaw-maintainer-pr-comparator/tiebreakers.md
  • .agents/skills/nemoclaw-maintainer-pr-comparator/validation/backtest.md

Comment thread .agents/skills/nemoclaw-maintainer-pr-comparator/scripts/collect-gates.sh Outdated
Comment thread .agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py Outdated
Comment thread .agents/skills/nemoclaw-maintainer-pr-comparator/templates/verdict.md Outdated
@cjagwani cjagwani self-assigned this May 5, 2026
cjagwani added 2 commits May 5, 2026 12:13
- collect-gates.sh: null-coalesce statusCheckRollup so jq doesn't fail
  when a PR has no status checks configured (CR major)
- find-candidates.sh + backtest.md: `in:body` is a search qualifier,
  not a flag — move it inside the query string (CR critical)
- tier-2-quality.md: add 2.4 workaround-vs-root-cause check that was
  in the locked v1 spec but dropped during the refactor; render-verdict.py
  was already counting it (CR major — code/docs alignment)
- render-verdict.py: guard winner=None in both happy and degraded modes
  to avoid 'MERGE PR #None' output (CR minor)
- templates/verdict.md: correct example weighted-score denominator
  to 16.0 (4 Tier-2 checks ×1 + 6 Tier-1 ×2) instead of stale 21.0 (CR minor)
@cjagwani cjagwani added v0.0.35 and removed v0.0.35 labels May 5, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py:
- Around line 129-134: The current total uses all keys from
pr.get("tier_1"/"tier_2") which lets extra/hidden keys change rankings; change
the summation to iterate over the canonical key lists (tier_1_keys and
tier_2_keys) and call score_for(pr.get("tier_1", {}).get(key), TIER_1_WEIGHT)
and similarly for tier_2, so only the fixed keys affect total; keep max_total
as-is (len(tier_1_keys)*TIER_1_WEIGHT + len(tier_2_keys)*TIER_2_WEIGHT) and
ignore any extra keys in pr when computing total.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 48d51831-226d-4535-b7f4-9565c77dd699

📥 Commits

Reviewing files that changed from the base of the PR and between 65cbafd and 0e5936c.

📒 Files selected for processing (6)
  • .agents/skills/nemoclaw-maintainer-pr-comparator/checks/tier-2-quality.md
  • .agents/skills/nemoclaw-maintainer-pr-comparator/scripts/collect-gates.sh
  • .agents/skills/nemoclaw-maintainer-pr-comparator/scripts/find-candidates.sh
  • .agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py
  • .agents/skills/nemoclaw-maintainer-pr-comparator/templates/verdict.md
  • .agents/skills/nemoclaw-maintainer-pr-comparator/validation/backtest.md
✅ Files skipped from review due to trivial changes (2)
  • .agents/skills/nemoclaw-maintainer-pr-comparator/validation/backtest.md
  • .agents/skills/nemoclaw-maintainer-pr-comparator/templates/verdict.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • .agents/skills/nemoclaw-maintainer-pr-comparator/scripts/collect-gates.sh
  • .agents/skills/nemoclaw-maintainer-pr-comparator/scripts/find-candidates.sh

Comment on lines +129 to +134
total = 0.0
for status in pr.get("tier_1", {}).values():
total += score_for(status, TIER_1_WEIGHT)
for status in pr.get("tier_2", {}).values():
total += score_for(status, TIER_2_WEIGHT)
max_total = len(tier_1_keys) * TIER_1_WEIGHT + len(tier_2_keys) * TIER_2_WEIGHT

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Score calculation can include hidden checks and skew ranking.

total sums all values present in tier_1/tier_2, while the table and max_total are based on fixed key lists. Extra keys in input can silently change ranking without appearing in the rendered scorecard.

🔧 Proposed fix
     score_row = ["**Weighted score**"]
     for pr in prs:
         total = 0.0
-        for status in pr.get("tier_1", {}).values():
-            total += score_for(status, TIER_1_WEIGHT)
-        for status in pr.get("tier_2", {}).values():
-            total += score_for(status, TIER_2_WEIGHT)
+        tier_1 = pr.get("tier_1", {})
+        tier_2 = pr.get("tier_2", {})
+        for key in tier_1_keys:
+            total += score_for(tier_1.get(key, "fail"), TIER_1_WEIGHT)
+        for key in tier_2_keys:
+            total += score_for(tier_2.get(key, "fail"), TIER_2_WEIGHT)
         max_total = len(tier_1_keys) * TIER_1_WEIGHT + len(tier_2_keys) * TIER_2_WEIGHT
         score_row.append(f"{total:.1f} / {max_total:.1f}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
total = 0.0
for status in pr.get("tier_1", {}).values():
total += score_for(status, TIER_1_WEIGHT)
for status in pr.get("tier_2", {}).values():
total += score_for(status, TIER_2_WEIGHT)
max_total = len(tier_1_keys) * TIER_1_WEIGHT + len(tier_2_keys) * TIER_2_WEIGHT
total = 0.0
tier_1 = pr.get("tier_1", {})
tier_2 = pr.get("tier_2", {})
for key in tier_1_keys:
total += score_for(tier_1.get(key, "fail"), TIER_1_WEIGHT)
for key in tier_2_keys:
total += score_for(tier_2.get(key, "fail"), TIER_2_WEIGHT)
max_total = len(tier_1_keys) * TIER_1_WEIGHT + len(tier_2_keys) * TIER_2_WEIGHT
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py
around lines 129 - 134, The current total uses all keys from
pr.get("tier_1"/"tier_2") which lets extra/hidden keys change rankings; change
the summation to iterate over the canonical key lists (tier_1_keys and
tier_2_keys) and call score_for(pr.get("tier_1", {}).get(key), TIER_1_WEIGHT)
and similarly for tier_2, so only the fixed keys affect total; keep max_total
as-is (len(tier_1_keys)*TIER_1_WEIGHT + len(tier_2_keys)*TIER_2_WEIGHT) and
ignore any extra keys in pr when computing total.

@cv cv enabled auto-merge (squash) May 5, 2026 22:22
@cv cv merged commit f40c25e into main May 5, 2026
9 checks passed
@wscurran wscurran added the feature PR adds or expands user-visible functionality label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature PR adds or expands user-visible functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants