Skip to content

fix(ci): preserve Barnacle proof labels#83735

Merged
Takhoffman merged 2 commits into
mainfrom
codex/changelog-83247
May 18, 2026
Merged

fix(ci): preserve Barnacle proof labels#83735
Takhoffman merged 2 commits into
mainfrom
codex/changelog-83247

Conversation

@Takhoffman

Copy link
Copy Markdown
Contributor

Summary

  • Preserve proof: sufficient when proof: override is present.
  • Keep proof: sufficient during unrelated label churn so Barnacle does not revoke ClawSweeper/manual sufficiency on status-label updates.
  • Add focused Barnacle regressions for override coexistence and unrelated label events.

Verification

  • node scripts/run-vitest.mjs test/scripts/barnacle-auto-response.test.ts

@openclaw-barnacle openclaw-barnacle Bot added scripts Repository scripts size: S maintainer Maintainer-authored PR labels May 18, 2026
@clawsweeper

clawsweeper Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs changes before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR changes Barnacle proof-label removal logic and adds regressions so proof: sufficient survives proof: override and unrelated label events.

Reproducibility: yes. Source inspection shows current main removes sufficient proof for non-passed evaluations, and the PR's new unrelated-label case can preserve sufficiency while the existing classification path still adds proof: needs-real-behavior.

PR rating
Overall: 🦐 gold shrimp
Proof: 🐚 platinum hermit
Patch quality: 🦐 gold shrimp
Summary: The proof gate is overridden and the patch is focused, but the current implementation still has a blocking proof-label consistency bug.

Rank-up moves:

  • Fix the contradictory proof: sufficient plus proof: needs-real-behavior label path and add the missing addLabels assertions.
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

PR egg
✨ Hatched: 🥚 common Frosted Lint Imp

        /\     /\            
      _/  \___/  \_          
     /  ( o   o )  \         
    |      \_/      |        
    |   /\  ===  /\ |        
     \_/  \_____/  \_/       
        _/|_| |_|\_          
       /__| | | |__\         
          ' ' ' '            
         /_/     \_\         
       .-----------.         
      '-------------'        

Rarity: 🥚 common.
Trait: sniffs out flaky tests.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Frosted Lint Imp in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • How to hatch it: reach status: 👀 ready for maintainer look or status: 🚀 automerge armed; that usually means sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

Real behavior proof
Override: A maintainer applied proof: override for this PR.

Risk before merge
Why this matters: - Merging as-is can leave PRs with both proof: sufficient and proof: needs-real-behavior after unrelated label churn, which makes the proof gate state contradictory.

  • The diff touches Barnacle label automation, so mistakes can affect ClawSweeper review and automerge gates even when normal unit CI passes.

Maintainer options:

  1. Fix contradictory proof labels before merge (recommended)
    Update Barnacle so unrelated label events with existing sufficient proof neither remove sufficiency nor add negative proof candidate labels, and assert that in the regression tests.
  2. Accept the label-state risk
    Maintainers can intentionally land the narrower preservation fix now, owning the chance that Barnacle still shows contradictory proof labels until a follow-up.
Copy recommended automerge instruction
@clawsweeper automerge

Special instructions:
Update Barnacle proof-label handling so when `proof: sufficient` is already present and the event is not `edited` or `synchronize`, Barnacle neither removes sufficiency nor adds negative proof candidate labels; add focused tests that assert `calls.addLabels` remains empty for unrelated label churn and override-preservation cases.

Next step before merge
The remaining blocker is a narrow Barnacle label-state repair that an automated worker can attempt on this automerge-opted PR branch.

Security
Cleared: The diff only changes Barnacle label predicates and tests; it adds no dependency, workflow permission, secret handling, or new code-execution surface.

Review findings

  • [P2] Suppress negative proof labels when preserving sufficiency — scripts/github/barnacle-auto-response.mjs:804-805
Review details

Best possible solution:

Preserve proof: sufficient on override and unrelated label events while also suppressing contradictory negative proof labels, with focused tests covering both removal and addition behavior.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection shows current main removes sufficient proof for non-passed evaluations, and the PR's new unrelated-label case can preserve sufficiency while the existing classification path still adds proof: needs-real-behavior.

Is this the best way to solve the issue?

No, not quite. The predicate change is the right narrow area, but the complete fix also needs to prevent negative proof labels from being added during the same preserved-sufficiency label churn path.

Label justifications:

  • P2: This is a normal-priority CI automation bug with limited runtime blast radius but direct effect on proof labels and review gates.
  • merge-risk: 🚨 automation: The diff changes Barnacle proof-label synchronization, where a bad merge can confuse automated proof and automerge state.

Full review comments:

  • [P2] Suppress negative proof labels when preserving sufficiency — scripts/github/barnacle-auto-response.mjs:804-805
    This early return keeps proof: sufficient on unrelated label events, but the function still continues into classifyPullRequestCandidateLabels and addMissingLabels. For a PR with proof: sufficient but no body proof, evaluateRealBehaviorProof still classifies proof: needs-real-behavior, so Barnacle can add the missing-proof label beside the sufficient label. Please make the preservation path suppress contradictory proof candidate labels too, and assert calls.addLabels stays empty in the new label-churn regression.
    Confidence: 0.87

Overall correctness: patch is incorrect
Overall confidence: 0.87

Acceptance criteria:

  • node scripts/run-vitest.mjs test/scripts/barnacle-auto-response.test.ts
  • git diff --check

What I checked:

Likely related people:

  • Takhoffman: Authored the recent exact-head proof verdict and trusted-marker commits that own much of the proof-label behavior touched here. (role: recent area contributor; confidence: high; commits: e4fba78d81fe, 06a39015f21c; files: scripts/github/barnacle-auto-response.mjs, scripts/github/real-behavior-proof-policy.mjs, test/scripts/barnacle-auto-response.test.ts)
  • Yao: Blame points the original Barnacle stale-proof removal predicate and matching tests to the grafted introduction commit for this automation surface. (role: introduced behavior; confidence: medium; commits: 6a5a1353c7f0; files: scripts/github/barnacle-auto-response.mjs, test/scripts/barnacle-auto-response.test.ts)
  • Dallin Romney: Worked on the real-behavior-proof gate shortly before this PR, making them adjacent to the proof evaluation contract even though not central to this predicate. (role: adjacent contributor; confidence: medium; commits: cf194419c315; files: scripts/github/real-behavior-proof-policy.mjs, test/scripts/real-behavior-proof-policy.test.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 1fb09069c342.

@Takhoffman

Copy link
Copy Markdown
Contributor Author

@clawsweeper automerge

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 🛠️ actively grinding The PR author has acted after the latest ClawSweeper review and work remains. clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge labels May 18, 2026
@clawsweeper

clawsweeper Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper 🐠 automerge status

This pass ended as a no-op: no narrow repair surfaced, so ClawSweeper left the branch untouched.

Executor outcome: source PR #83735 is paused by clawsweeper:human-review; refusing to mutate the PR branch.
Worker summary: Make this PR merge-ready for ClawSweeper automerge. Rebase onto latest main, address PR comments and review findings, fix CI/check failures, add required changelog if needed, and validate before returning.

Worker actions:

  • build_fix_artifact on this PR: planned - Maintainer opted this PR into ClawSweeper automerge/autofix repair; run the direct Codex edit loop after live hydration instead of a separate read-only planning pass.

ClawSweeper left the PR as-is: no push, no rebase, no replacement PR, no merge, and no fresh ClawSweeper pass.

fish notes: model gpt-5.5, reasoning high.

Automerge progress:

  • 2026-05-18 19:20:47 UTC review queued d86adea24c08 (queued)

@clawsweeper clawsweeper Bot added the clawsweeper:human-review Needs maintainer review before ClawSweeper can continue label May 18, 2026
@clawsweeper

clawsweeper Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

🦞✅
ClawSweeper is pausing this repair loop for human review.

Source: clawsweeper[bot]
Reason: Protected maintainer labeling plus proof-label automation risk make this a maintainer validation item rather than a ClawSweeper repair job.; Cleared: The merge result only narrows Barnacle proof-label removal and keeps trusted ClawSweeper marker authentication; it adds no new dependency, permission, secret, or code-execution surface. (sha=d86adea24c08b474e5de02380d5f9bb1f53ebe5a)

I added clawsweeper:human-review and left the final call with a maintainer.

@Takhoffman Takhoffman added the proof: override Maintainer override for the external PR real behavior proof gate. label May 18, 2026
@Takhoffman

Copy link
Copy Markdown
Contributor Author

@clawsweeper automerge

@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels May 18, 2026
@Takhoffman Takhoffman merged commit c92ebd6 into main May 18, 2026
139 of 149 checks passed
@Takhoffman Takhoffman deleted the codex/changelog-83247 branch May 18, 2026 19:37
@clawsweeper clawsweeper Bot added status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 automation 🚨 May affect CI, automerge, proof capture, label sync, or maintainer automation. and removed status: 🛠️ actively grinding The PR author has acted after the latest ClawSweeper review and work remains. labels May 18, 2026
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
* fix(ci): preserve sufficient proof override

* fix(ci): keep sufficient proof on label churn
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
* fix(ci): preserve sufficient proof override

* fix(ci): keep sufficient proof on label churn
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026
* fix(ci): preserve sufficient proof override

* fix(ci): keep sufficient proof on label churn
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
* fix(ci): preserve sufficient proof override

* fix(ci): keep sufficient proof on label churn
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026
* fix(ci): preserve sufficient proof override

* fix(ci): keep sufficient proof on label churn
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
* fix(ci): preserve sufficient proof override

* fix(ci): keep sufficient proof on label churn
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
* fix(ci): preserve sufficient proof override

* fix(ci): keep sufficient proof on label churn
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026
* fix(ci): preserve sufficient proof override

* fix(ci): keep sufficient proof on label churn
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
* fix(ci): preserve sufficient proof override

* fix(ci): keep sufficient proof on label churn
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
* fix(ci): preserve sufficient proof override

* fix(ci): keep sufficient proof on label churn
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
* fix(ci): preserve sufficient proof override

* fix(ci): keep sufficient proof on label churn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge clawsweeper:human-review Needs maintainer review before ClawSweeper can continue maintainer Maintainer-authored PR merge-risk: 🚨 automation 🚨 May affect CI, automerge, proof capture, label sync, or maintainer automation. P2 Normal backlog priority with limited blast radius. proof: override Maintainer override for the external PR real behavior proof gate. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. scripts Repository scripts size: S status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant