feat(skills): add autonomous PR comparator for issues with competing fixes by cjagwani · Pull Request #3052 · NVIDIA/NemoClaw

cjagwani · 2026-05-05T18:40:53Z

Summary

Adds nemoclaw-maintainer-pr-comparator: an autonomous skill that picks the merge winner among competing PRs targeting the same issue. Runs Tier 0 plumbing gates → Tier 1 correctness checks → Tier 2 quality checks → Tier 3 deterministic ranking. Degraded mode handles the case where no PR passes gates.

Why

When two contributors fix the same issue, deciding which to merge is judgment-heavy. This skill makes that judgment deterministic, explainable, and reusable across repos.

What's clever (catches what CI can't)

Reads each new test against the pre-fix code to verify the test would have failed (catches smoke-test fakes)
Parses issue body and comments for acceptance criteria (catches "and don't break Y while you're at it")
Refactor-vs-behavior scan flags hidden behavior changes inside what looks like a rename
Mocking-purity check catches tests that mock the unit under test
Public-surface preservation flags content changes (not moves) in flags/help/errors
Workaround-vs-root-cause check flags symptom-suppressing try/catch fixes without follow-up
Behavior-coverage matrix recommends "merge A, cherry-pick B's tests for criterion X" when no PR dominates
Degraded mode picks closest-to-ready when neither passes gates instead of giving up

Validation

Backtested against 5 historical NemoClaw cases (#2681, #2947, #893, #2636, refactor chain #2087/#2489/#2495). Initial v1: 4/5. After patches (PR-state-OPEN gate, supersession detection, workaround-vs-root-cause flag): 5/5.

Skill structure (per Claude best-practices)

Slim SKILL.md (orchestration), tier checks in one-level-deep reference files, output template extracted, five utility scripts for the deterministic work, repo-specific assumptions in repo-policy.md for cross-repo reuse.

.agents/skills/nemoclaw-maintainer-pr-comparator/
├── SKILL.md                        # orchestration only
├── repo-policy.md                  # configurable per-repo defaults
├── tiebreakers.md                  # Tier 3 + degraded mode
├── checks/
│   ├── tier-0-gates.md
│   ├── tier-1-correctness.md
│   └── tier-2-quality.md
├── templates/verdict.md
├── validation/backtest.md
└── scripts/
    ├── find-candidates.sh
    ├── collect-gates.sh
    ├── check-coderabbit-threads.sh   # GraphQL on reviewThreads.isResolved
    ├── parse-supersession.sh
    └── render-verdict.py

Test plan

markdownlint clean
shellcheck clean
python compiles
Backtest 5/5 on historical cases
Try the skill on a live duplicate-PR situation when one arises

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added a user-invocable PR comparison skill to evaluate competing PRs, run multi-tier gates, score/rank candidates, and output a recommended merge verdict with evidence and trace.
- Added CLI helpers for candidate discovery, gate collection, supersession detection, thread checks, and verdict rendering.
Documentation
- New repo policy, tiered-check rubrics (Tier 0–3), verdict template, and backtest/validation guides for ongoing tuning.

…cisions Adds nemoclaw-maintainer-pr-comparator: an autonomous skill that picks the best PR among multiple candidates targeting the same issue. Runs a 3-tier check pipeline (plumbing gates, correctness, code quality) plus deterministic comparative scoring with happy-path and degraded modes. Validated against 5 historical NemoClaw cases (#2681, #2947, #893, #2636, refactor chain #2087/#2489/#2495); spec patches added for PR-state-OPEN gate, supersession detection, and workaround-vs-root-cause flag based on validation findings.

…r-comparator Slim SKILL.md (orchestration only); tier checks moved to one-level-deep reference files; output template, repo policy, and validation guidance extracted; five utility scripts added for the deterministic work that previously required Claude to inspect raw gh JSON. Substance changes per Claude best-practices review: - Description rewritten in third person, scope no longer overpromises - CodeRabbit thread check uses GraphQL on reviewThreads.isResolved (REST /comments lacks resolution state) - Candidate discovery follows a single default order with explicit stop conditions instead of "plus optionally" expansions - Author-merge-ratio tiebreaker dropped (process-noisy) - "Earlier PR" demoted to final deterministic fallback - Repo-specific assumptions pulled into repo-policy.md

copy-pr-bot · 2026-05-05T18:40:57Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-05T18:41:14Z

📝 Walkthrough

Walkthrough

A new nemoclaw-maintainer-pr-comparator skill is introduced to evaluate and rank competing open PRs targeting the same issue. The skill implements a tiered model: mandatory Tier 0 plumbing gates, LLM-based Tier 1 correctness checks and Tier 2 quality checks with weighted scoring (Tier 1 = 2.0×, Tier 2 = 1.0×), and Tier 3 tiebreakers with a degraded-mode fallback. It includes scripts for candidate discovery, gate collection, supersession parsing, verdict rendering, repository policy configuration, and backtest validation.

Changes

PR Comparator Skill System

Layer / File(s)	Summary
Skill Definition & Workflow `.agents/skills/nemoclaw-maintainer-pr-comparator/SKILL.md`	Adds skill frontmatter and a nine-step workflow from acceptance-criteria extraction through candidate discovery, supersession detection, Tier 0 gating, Tier 1/2 LLM checks, Tier 3 ranking, and verdict emission. Lists prerequisites, reference files, executed scripts, and deferred v2 capabilities.
Tier 0 Plumbing Gates (data collection & spec) `.agents/skills/nemoclaw-maintainer-pr-comparator/checks/tier-0-gates.md`, `.../scripts/collect-gates.sh`, `.../scripts/check-coderabbit-threads.sh`	Documents five mandatory gates (open state, latest-head CI green, mergeable/clean, branch-protection-approved, automated-reviewer threads resolved). Implements `collect-gates.sh` to emit gate booleans and failure tags and `check-coderabbit-threads.sh` to query GraphQL for bot thread resolution and produce JSON gate output.
Tier 1 Correctness (LLM judgment spec) `.agents/skills/nemoclaw-maintainer-pr-comparator/checks/tier-1-correctness.md`	Adds six correctness criteria (pre-fix failing tests, comment-as-spec mapping, negative/edge-case tests, coverage shape per-branch, refactor-vs-behavior scan, mocking purity) with pass/yellow/fail rules and required file/line evidence capture.
Tier 2 Quality (LLM judgment spec) `.agents/skills/nemoclaw-maintainer-pr-comparator/checks/tier-2-quality.md`	Adds four quality checks (changes description coverage, migration completeness, public-surface preservation with docs/Notes requirements, workaround/suppression detection) with pass/yellow/fail semantics and scoring weight.
Repository Policy Configuration `.agents/skills/nemoclaw-maintainer-pr-comparator/repo-policy.md`	New repo-level YAML-configurable defaults (CODEOWNERS enforcement via branch protection, DCO check name, automated reviewer bot list and thread gating, docs directory for Tier 2 checks, coverage ratchet flags, candidate discovery tuning parameters, excluded bot authors).
Candidate Discovery & Supersession Parsing `.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/find-candidates.sh`, `.../scripts/parse-supersession.sh`	`find-candidates.sh` discovers up to 10 candidate PRs via staged expansion: explicit issue refs in bodies → filename-in-files search → title-token Jaccard similarity. `parse-supersession.sh` extracts superseder→superseded edges from PR body patterns and emits JSON edges restricted to provided candidates.
Verdict Rendering & Template `.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py`, `.../templates/verdict.md`	Adds `render-verdict.py` (Tier weights constants, helpers to map statuses to scores/emojis, functions to render scorecard, optional behavior-coverage matrix, and evidence sections) and a `verdict.md` template specifying required evidence format, acceptance-criteria checklist, per-PR scorecard, and degraded-mode messaging.
Tier 3 Ranking & Tiebreakers `.agents/skills/nemoclaw-maintainer-pr-comparator/tiebreakers.md`	Adds ranking logic: happy mode (weighted Tier 1–2 scoring, behavior-coverage matrix, ordered tiebreakers — supersession, smaller diff, negative test coverage, recency, deterministic earliest) and degraded mode (rank by fewer substantive then trivial Tier 0 failures, then weighted score; produce salvage steps and reports).
Validation / Backtest Guide `.agents/skills/nemoclaw-maintainer-pr-comparator/validation/backtest.md`	Adds backtest harness guidance for historical cases, test-loop procedure, classification of outcomes, target error rates, remediation steps, and documented failure modes with mitigations.

Sequence Diagram

sequenceDiagram
    actor User
    participant Skill as PR Comparator<br/>Skill
    participant GitHub as GitHub API
    participant Scripts as Collection<br/>Scripts
    participant LLM as LLM<br/>Evaluator
    participant Renderer as Verdict<br/>Renderer

    User->>Skill: invoke(issue_number)
    Skill->>GitHub: fetch issue (acceptance criteria + comments)
    GitHub-->>Skill: issue body + comments
    Skill->>Scripts: find-candidates.sh (issue_number)
    Scripts->>GitHub: search PRs by body/files/title
    GitHub-->>Scripts: candidate PR numbers
    Scripts-->>Skill: [PR#...]
    Skill->>Scripts: parse-supersession.sh (PR#...)
    Scripts->>GitHub: fetch PR bodies
    GitHub-->>Scripts: PR body text
    Scripts-->>Skill: supersession edges
    Skill->>Scripts: collect-gates.sh (PR#n)
    Scripts->>GitHub: gh pr view (state, CI, mergeable, reviewDecision)
    GitHub-->>Scripts: gate data
    Scripts-->>Skill: gate results per PR
    Skill->>Scripts: check-coderabbit-threads.sh (PR#n)
    Scripts->>GitHub: GraphQL query (reviewThreads)
    GitHub-->>Scripts: thread resolution state
    Scripts-->>Skill: thread gate status
    Skill->>LLM: evaluate Tier 1 (diffs, tests, issue criteria)
    LLM-->>Skill: correctness scores + evidence
    Skill->>LLM: evaluate Tier 2 (description, docs, migrations)
    LLM-->>Skill: quality scores + evidence
    Skill->>Skill: Tier 3: rank via tiebreakers or degraded mode
    Skill->>Renderer: render-verdict.py (comparison JSON)
    Renderer-->>Skill: Markdown verdict
    Skill-->>User: verdict + scorecard + reasoning + evidence

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Hops through PR meadows with whiskers held high,
Five gates, two tiers, and a tiebreaker tie,
Scripts sniff the trails, LLMs lend their sight,
A verdict in Markdown—reasoned, scored, and bright! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately summarizes the main change: introducing an autonomous PR comparator skill for issues with competing fixes. The title directly reflects the primary functionality added across all the files in the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/skill-pr-comparator

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (3)

.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/parse-supersession.sh (1)

24-26: 💤 Low value

Missing validation for --repo argument value.

If --repo is passed as the last argument without a value (e.g., parse-supersession.sh 123 --repo), the script will fail with an unbound variable error on $2 due to set -u, or shift 2 will fail if there's only one argument left. Adding validation improves error messaging.

♻️ Proposed defensive check

     --repo)
+      if [ -z "${2:-}" ]; then
+        echo "Error: --repo requires OWNER/REPO argument" >&2
+        exit 64
+      fi
       repo_args=(--repo "$2")
       shift 2
       ;;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
@.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/parse-supersession.sh
around lines 24 - 26, The handler for the --repo option in parse-supersession.sh
uses "$2" and shift 2 without validating that a value exists; update the --repo
case to check that a non-empty argument is present (e.g., verify $# -ge 2 or
that "${2:-}" is non-empty) before assigning repo_args=(--repo "$2") and
performing shift 2, and if the check fails print a clear error/usage message and
exit with non-zero status so the script doesn't hit an unbound-variable or
broken shift.

.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/check-coderabbit-threads.sh (1)

98-105: 💤 Low value

Consider JSON-escaping $bot_login in heredoc output.

If $bot_login contains quotes or backslashes (unlikely but possible with custom bot names), the JSON output would be malformed. Using jq for the final output would ensure valid JSON.

♻️ Safer JSON construction with jq

-cat <<JSON
-{
-  "pr": $pr,
-  "bot_login": "$bot_login",
-  "gate_coderabbit_threads_resolved": $gate_pass,
-  "details": $counts
-}
-JSON
+jq -n \
+  --argjson pr "$pr" \
+  --arg bot "$bot_login" \
+  --argjson gate "$gate_pass" \
+  --argjson details "$counts" \
+  '{pr: $pr, bot_login: $bot, gate_coderabbit_threads_resolved: $gate, details: $details}'

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
@.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/check-coderabbit-threads.sh
around lines 98 - 105, The heredoc that emits JSON uses the unescaped variable
$bot_login which can break JSON if it contains quotes/backslashes; update the
script (check-coderabbit-threads.sh) to build the JSON using a safe tool like jq
or to explicitly JSON-escape $bot_login before emitting (e.g., construct the
object with jq --arg pr "$pr" --arg bot_login "$bot_login" --argjson
gate_coderabbit_threads_resolved "$gate_pass" --argjson details "$counts" '.pr =
($pr|tonumber?) | .bot_login = $bot_login | .gate_coderabbit_threads_resolved =
$gate_coderabbit_threads_resolved | .details = $details' or equivalent) so the
output is always valid JSON and references the same variables ($pr, $bot_login,
$gate_pass, $counts).

.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py (1)

181-186: ⚡ Quick win

Add defensive handling for missing required keys.

spec["issue"] will raise KeyError if the key is missing, producing a cryptic traceback. Consider validating required keys upfront or using .get() with appropriate defaults/error messages.

♻️ Proposed validation

+    required_keys = ["issue", "prs"]
+    for key in required_keys:
+        if key not in spec:
+            print(f"Missing required key in spec: {key}", file=sys.stderr)
+            return 64
+
     issue = spec["issue"]
     criteria = spec.get("criteria", [])
     prs = spec.get("prs", [])

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py
around lines 181 - 186, spec["issue"] can raise a KeyError if the input dict
lacks required keys; update the code that reads spec (the block assigning issue,
criteria, prs, winner, mode, tiebreaker) to validate required keys up front
(e.g., check for "issue" and any other required fields) and either use
spec.get(...) with sensible defaults for optional fields (criteria, prs, winner,
mode, tiebreaker) or raise a clear ValueError/TypeError with a descriptive
message when a required key is missing so callers get an actionable error
instead of a cryptic traceback.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/collect-gates.sh:
- Around line 38-39: The jq expressions computing ci_failure_count and
ci_pending_count fail when .statusCheckRollup is null; update both queries to
null-coalesce .statusCheckRollup to an empty array (e.g. use .statusCheckRollup
// []) so the select/iteration always runs on an array; modify the expressions
that set ci_failure_count and ci_pending_count to use (.statusCheckRollup // [])
in place of .statusCheckRollup.

In @.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/find-candidates.sh:
- Around line 36-38: The gh CLI search invocation assigned to the candidates
variable uses an invalid flag (--in:body); move the body qualifier into the
query string so gh search prs receives the full query (e.g., include "in:body
#${issue_number}" with repo_args), retry the search without suppressing stderr
while preserving piping to sort/head, and ensure the command still outputs JSON
numbers for the jq filter; update the command that builds candidates (using gh
search prs, repo_args, issue_number, MAX_CANDIDATES) accordingly so it actually
finds PRs containing the issue number in the body.

In @.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py:
- Around line 113-118: tier_2_keys currently contains "workaround_vs_root_cause"
but there is no corresponding check in checks/tier-2-quality.md; fix by either
removing "workaround_vs_root_cause" from the tier_2_keys list in
render-verdict.py (so the code and docs align) or add a formal definition and
scoring for the workaround_vs_root_cause check in checks/tier-2-quality.md (and
update any max-score calculations/templates accordingly) — locate the symbol
tier_2_keys in
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py and
make the change consistently (removal from the list OR a documented check entry
and score in tier-2-quality.md).
- Around line 205-209: The current prints use the variable winner directly and
produce confusing output when winner is None; update the code that handles mode
and winner (variables mode and winner) to validate winner first and substitute a
clear fallback (e.g., "no PR selected" or "no winner") before formatting
messages, so in the "happy" branch you print "VERDICT: MERGE PR #<id>" only when
winner is not None and otherwise print "VERDICT: no PR selected", and in the
degraded branch print "Neither mergeable yet" and "No PR is closer" (or similar)
when winner is None; ensure all f-strings using winner are guarded or replaced
with the fallback string.

In @.agents/skills/nemoclaw-maintainer-pr-comparator/templates/verdict.md:
- Line 34: The example in the verdict template shows an incorrect max score
(21.0); update the "Weighted score" row in
.agents/skills/nemoclaw-maintainer-pr-comparator/templates/verdict.md to reflect
the true maximum based on actual check counts and weights (Tier 1 = 2.0× per
check, Tier 2 = 1.0× per check). Reconcile counts with render-verdict.py (which
currently includes the Tier‑2 check workaround_vs_root_cause) and compute the
correct max (e.g., 6 Tier‑1 checks ×2 + 4 Tier‑2 checks ×1 = 16.0, or adjust if
checks differ), then replace both the numerator and denominator values so the
example matches the code.

In @.agents/skills/nemoclaw-maintainer-pr-comparator/validation/backtest.md:
- Around line 22-24: The gh search prs command uses search qualifiers inside the
query string, not as flags; replace the incorrect snippet gh search prs --repo
OWNER/REPO --merged --in:body "supersedes" --limit 30 by moving the qualifier
into the quoted query, e.g. gh search prs --repo OWNER/REPO --merged "in:body
supersedes" --limit 30 (or include repo/is:merged qualifiers inside the query as
needed) so the in:body qualifier is part of the query string.

---

Nitpick comments:
In
@.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/check-coderabbit-threads.sh:
- Around line 98-105: The heredoc that emits JSON uses the unescaped variable
$bot_login which can break JSON if it contains quotes/backslashes; update the
script (check-coderabbit-threads.sh) to build the JSON using a safe tool like jq
or to explicitly JSON-escape $bot_login before emitting (e.g., construct the
object with jq --arg pr "$pr" --arg bot_login "$bot_login" --argjson
gate_coderabbit_threads_resolved "$gate_pass" --argjson details "$counts" '.pr =
($pr|tonumber?) | .bot_login = $bot_login | .gate_coderabbit_threads_resolved =
$gate_coderabbit_threads_resolved | .details = $details' or equivalent) so the
output is always valid JSON and references the same variables ($pr, $bot_login,
$gate_pass, $counts).

In
@.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/parse-supersession.sh:
- Around line 24-26: The handler for the --repo option in parse-supersession.sh
uses "$2" and shift 2 without validating that a value exists; update the --repo
case to check that a non-empty argument is present (e.g., verify $# -ge 2 or
that "${2:-}" is non-empty) before assigning repo_args=(--repo "$2") and
performing shift 2, and if the check fails print a clear error/usage message and
exit with non-zero status so the script doesn't hit an unbound-variable or
broken shift.

In @.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py:
- Around line 181-186: spec["issue"] can raise a KeyError if the input dict
lacks required keys; update the code that reads spec (the block assigning issue,
criteria, prs, winner, mode, tiebreaker) to validate required keys up front
(e.g., check for "issue" and any other required fields) and either use
spec.get(...) with sensible defaults for optional fields (criteria, prs, winner,
mode, tiebreaker) or raise a clear ValueError/TypeError with a descriptive
message when a required key is missing so callers get an actionable error
instead of a cryptic traceback.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7f7e4796-b974-4736-bfdb-2899699a803c

📥 Commits

Reviewing files that changed from the base of the PR and between c398ad9 and 65cbafd.

📒 Files selected for processing (13)

.agents/skills/nemoclaw-maintainer-pr-comparator/SKILL.md
.agents/skills/nemoclaw-maintainer-pr-comparator/checks/tier-0-gates.md
.agents/skills/nemoclaw-maintainer-pr-comparator/checks/tier-1-correctness.md
.agents/skills/nemoclaw-maintainer-pr-comparator/checks/tier-2-quality.md
.agents/skills/nemoclaw-maintainer-pr-comparator/repo-policy.md
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/check-coderabbit-threads.sh
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/collect-gates.sh
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/find-candidates.sh
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/parse-supersession.sh
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py
.agents/skills/nemoclaw-maintainer-pr-comparator/templates/verdict.md
.agents/skills/nemoclaw-maintainer-pr-comparator/tiebreakers.md
.agents/skills/nemoclaw-maintainer-pr-comparator/validation/backtest.md

- collect-gates.sh: null-coalesce statusCheckRollup so jq doesn't fail when a PR has no status checks configured (CR major) - find-candidates.sh + backtest.md: `in:body` is a search qualifier, not a flag — move it inside the query string (CR critical) - tier-2-quality.md: add 2.4 workaround-vs-root-cause check that was in the locked v1 spec but dropped during the refactor; render-verdict.py was already counting it (CR major — code/docs alignment) - render-verdict.py: guard winner=None in both happy and degraded modes to avoid 'MERGE PR #None' output (CR minor) - templates/verdict.md: correct example weighted-score denominator to 16.0 (4 Tier-2 checks ×1 + 6 Tier-1 ×2) instead of stale 21.0 (CR minor)

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py:
- Around line 129-134: The current total uses all keys from
pr.get("tier_1"/"tier_2") which lets extra/hidden keys change rankings; change
the summation to iterate over the canonical key lists (tier_1_keys and
tier_2_keys) and call score_for(pr.get("tier_1", {}).get(key), TIER_1_WEIGHT)
and similarly for tier_2, so only the fixed keys affect total; keep max_total
as-is (len(tier_1_keys)*TIER_1_WEIGHT + len(tier_2_keys)*TIER_2_WEIGHT) and
ignore any extra keys in pr when computing total.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 48d51831-226d-4535-b7f4-9565c77dd699

📥 Commits

Reviewing files that changed from the base of the PR and between 65cbafd and 0e5936c.

📒 Files selected for processing (6)

.agents/skills/nemoclaw-maintainer-pr-comparator/checks/tier-2-quality.md
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/collect-gates.sh
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/find-candidates.sh
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py
.agents/skills/nemoclaw-maintainer-pr-comparator/templates/verdict.md
.agents/skills/nemoclaw-maintainer-pr-comparator/validation/backtest.md

✅ Files skipped from review due to trivial changes (2)

.agents/skills/nemoclaw-maintainer-pr-comparator/validation/backtest.md
.agents/skills/nemoclaw-maintainer-pr-comparator/templates/verdict.md

🚧 Files skipped from review as they are similar to previous changes (2)

.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/collect-gates.sh
.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/find-candidates.sh

coderabbitai · 2026-05-05T19:17:53Z

+        total = 0.0
+        for status in pr.get("tier_1", {}).values():
+            total += score_for(status, TIER_1_WEIGHT)
+        for status in pr.get("tier_2", {}).values():
+            total += score_for(status, TIER_2_WEIGHT)
+        max_total = len(tier_1_keys) * TIER_1_WEIGHT + len(tier_2_keys) * TIER_2_WEIGHT


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Score calculation can include hidden checks and skew ranking.

total sums all values present in tier_1/tier_2, while the table and max_total are based on fixed key lists. Extra keys in input can silently change ranking without appearing in the rendered scorecard.

🔧 Proposed fix

score_row = ["**Weighted score**"] for pr in prs: total = 0.0 - for status in pr.get("tier_1", {}).values(): - total += score_for(status, TIER_1_WEIGHT) - for status in pr.get("tier_2", {}).values(): - total += score_for(status, TIER_2_WEIGHT) + tier_1 = pr.get("tier_1", {}) + tier_2 = pr.get("tier_2", {}) + for key in tier_1_keys: + total += score_for(tier_1.get(key, "fail"), TIER_1_WEIGHT) + for key in tier_2_keys: + total += score_for(tier_2.get(key, "fail"), TIER_2_WEIGHT) max_total = len(tier_1_keys) * TIER_1_WEIGHT + len(tier_2_keys) * TIER_2_WEIGHT score_row.append(f"{total:.1f} / {max_total:.1f}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

total = 0.0

for status in pr.get("tier_1", {}).values():

total += score_for(status, TIER_1_WEIGHT)

for status in pr.get("tier_2", {}).values():

total += score_for(status, TIER_2_WEIGHT)

max_total = len(tier_1_keys) * TIER_1_WEIGHT + len(tier_2_keys) * TIER_2_WEIGHT

total = 0.0

tier_1 = pr.get("tier_1", {})

tier_2 = pr.get("tier_2", {})

for key in tier_1_keys:

total += score_for(tier_1.get(key, "fail"), TIER_1_WEIGHT)

for key in tier_2_keys:

total += score_for(tier_2.get(key, "fail"), TIER_2_WEIGHT)

max_total = len(tier_1_keys) * TIER_1_WEIGHT + len(tier_2_keys) * TIER_2_WEIGHT

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.agents/skills/nemoclaw-maintainer-pr-comparator/scripts/render-verdict.py around lines 129 - 134, The current total uses all keys from pr.get("tier_1"/"tier_2") which lets extra/hidden keys change rankings; change the summation to iterate over the canonical key lists (tier_1_keys and tier_2_keys) and call score_for(pr.get("tier_1", {}).get(key), TIER_1_WEIGHT) and similarly for tier_2, so only the fixed keys affect total; keep max_total as-is (len(tier_1_keys)*TIER_1_WEIGHT + len(tier_2_keys)*TIER_2_WEIGHT) and ignore any extra keys in pr when computing total.

cjagwani added 2 commits May 5, 2026 08:47

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

cjagwani self-assigned this May 5, 2026

cjagwani added the v0.0.35 label May 5, 2026

cjagwani added 2 commits May 5, 2026 12:13

Merge branch 'main' into feat/skill-pr-comparator

e0ecdb7

cjagwani added v0.0.35 and removed v0.0.35 labels May 5, 2026

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

Merge branch 'main' into feat/skill-pr-comparator

43f329a

cv approved these changes May 5, 2026

View reviewed changes

Merge branch 'main' into feat/skill-pr-comparator

a36a2dc

cv enabled auto-merge (squash) May 5, 2026 22:22

cv merged commit f40c25e into main May 5, 2026
9 checks passed

cjagwani mentioned this pull request May 5, 2026

feat(skills): add cross-issue regression sweep #3065

Merged

wscurran added the feature PR adds or expands user-visible functionality label Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): add autonomous PR comparator for issues with competing fixes#3052

feat(skills): add autonomous PR comparator for issues with competing fixes#3052
cv merged 6 commits into
mainfrom
feat/skill-pr-comparator

cjagwani commented May 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented May 5, 2026

Uh oh!

coderabbitai Bot commented May 5, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cjagwani commented May 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What's clever (catches what CI can't)

Validation

Skill structure (per Claude best-practices)

Test plan

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 5, 2026

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cjagwani commented May 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading