fix(ci): honor exact-head proof verdicts by Takhoffman · Pull Request #83688 · openclaw/openclaw

Takhoffman · 2026-05-18T16:33:02Z

Summary

Describe the problem and fix in 2-5 bullets:

Problem: Barnacle can remove proof: sufficient after ClawSweeper accepts real-behavior proof because Barnacle only trusts PR-body proof or proof: override.
Why it matters: ClawSweeper automerge can get stuck when the label is stripped and the required Real behavior proof check does not consume ClawSweeper's exact-head verdict.
What changed: The shared proof policy now recognizes ClawSweeper clawsweeper-verdict:pass markers only when they match the PR number and current head SHA; Barnacle preserves proof: sufficient for that exact-head verdict, and the required proof check accepts it.
What did NOT change (scope boundary): This does not weaken body-based proof validation, maintainer override behavior, or stale proof removal when the ClawSweeper verdict is missing or for a different head.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related fix(models): respect auth order in picker label #83581
This PR fixes a bug or regression

Real behavior proof (required for external PRs)

External contributors must show after-fix evidence from a real OpenClaw setup. Unit tests, mocks, lint, typechecks, snapshots, and CI are supplemental only. Screenshots are encouraged even for CLI, console, text, or log changes; terminal screenshots and copied live output count. Be mindful of private information like IP addresses, API keys, phone numbers, non-public endpoints, or other private details when providing evidence.

Behavior or issue addressed: Barnacle proof-label synchronization and the required proof check now honor an exact-head ClawSweeper pass verdict.
Real environment tested: Local OpenClaw checkout on macOS with Node/pnpm repo test wrappers.
Exact steps or command run after this patch: pnpm test test/scripts/real-behavior-proof-policy.test.ts test/scripts/barnacle-auto-response.test.ts; pnpm exec oxfmt --check --threads=1 .github/workflows/real-behavior-proof.yml scripts/github/real-behavior-proof-policy.mjs scripts/github/real-behavior-proof-check.mjs scripts/github/barnacle-auto-response.mjs test/scripts/real-behavior-proof-policy.test.ts test/scripts/barnacle-auto-response.test.ts; git diff --check origin/main...HEAD.
Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): Focused tests passed: 2 Vitest shards, 2 test files, 50 tests. Format check and diff whitespace check passed.
Observed result after fix: Tests prove exact-head ClawSweeper verdict markers are accepted only for the matching PR/head and Barnacle preserves proof: sufficient on synchronize when the marker is valid.
What was not tested: A live GitHub Actions run against a PR comment marker before opening this PR.
Before evidence (optional but encouraged): PR fix(models): respect auth order in picker label #83581 timeline showed Barnacle removing proof: sufficient after ClawSweeper applied it, while Real behavior proof remained failing.

Root Cause (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

Root cause: Barnacle and the Real behavior proof workflow used the structured PR body proof policy but did not read ClawSweeper's durable exact-head verdict marker.
Missing detection / guardrail: No test covered the cross-bot contract where ClawSweeper accepts proof in a marker and Barnacle later reevaluates labels on synchronize.
Contributing context (if known): Barnacle intentionally removes stale proof: sufficient on PR updates, but it had no way to distinguish stale labels from an exact-head ClawSweeper pass.

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write N/A.

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: test/scripts/real-behavior-proof-policy.test.ts; test/scripts/barnacle-auto-response.test.ts
Scenario the test should lock in: Exact-head ClawSweeper pass markers count as proof only for the matching PR/head, and Barnacle preserves proof: sufficient on synchronize when that marker exists.
Why this is the smallest reliable guardrail: The bug was in local proof-policy and Barnacle label synchronization logic, so unit/tooling tests cover the contract without needing a live PR.
Existing test that already covers this (if any): None before this PR.
If no new test is added, why not: N/A

User-visible / Behavior Changes

Automerge/proof automation can now proceed from an exact-head ClawSweeper proof verdict instead of racing over proof: sufficient label churn.

Diagram (if applicable)

Before:
ClawSweeper pass marker -> proof: sufficient -> Barnacle synchronize check -> label removed
PR body lacks structured proof -> Real behavior proof check fails

After:
ClawSweeper pass marker for current head -> Barnacle preserves proof: sufficient
ClawSweeper pass marker for current head -> Real behavior proof check passes

Security Impact (required)

New permissions/capabilities? (Yes/No) Yes
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) Yes
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) Yes
If any Yes, explain risk + mitigation: The proof workflow reads PR issue comments using the existing app token so it can find ClawSweeper's marker. The marker is accepted only when the PR number and exact 40-character head SHA match, limiting stale or copied-marker risk.

Repro + Verification

Environment

OS: macOS
Runtime/container: Local repo Node/pnpm test wrappers
Model/provider: N/A
Integration/channel (if any): GitHub Actions/Barnacle/ClawSweeper automation
Relevant config (redacted): N/A

Steps

Add a ClawSweeper pass verdict marker for a matching PR number and head SHA.
Run the proof policy and Barnacle auto-response tests.
Confirm mismatched head SHAs are rejected and matching markers preserve proof: sufficient.

Expected

Matching exact-head ClawSweeper pass markers count as sufficient proof.
Mismatched markers do not count.
Barnacle preserves proof: sufficient only when the exact-head marker exists.

Actual

Focused tests passed with those assertions.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios: Focused proof-policy tests and Barnacle label-sync tests passed locally.
Edge cases checked: Marker SHA mismatch does not count as proof.
What you did not verify: Live GitHub Actions execution before opening this PR.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: A copied ClawSweeper marker could otherwise be mistaken for proof.
- Mitigation: The parser requires both the PR number and exact current head SHA to match.

clawsweeper · 2026-05-18T16:34:09Z

Codex review: needs real behavior proof before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR adds exact-head ClawSweeper proof-verdict marker handling to the real-behavior-proof check, Barnacle proof-label synchronization, and related tests.

Reproducibility: yes. by source inspection and related PR timeline. Current main does not read ClawSweeper verdict comments in the required proof check, and the related PR at #83581 shows an exact-head pass marker while the Real behavior proof check remained failing.

PR rating
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🧂 unranked krab
Summary: Not merge-ready because proof is mock-only and the patch weakens the proof-gate trust boundary unless verdict comments are authenticated.

Rank-up moves:

Accept only ClawSweeper app-authored verdict comments and add a forged-comment regression test.
Use a token with issue-comment read permission, or retry with GITHUB_TOKEN, when fetching proof comments.
Add live workflow output or redacted terminal/log proof showing the exact-head marker path after the fix.

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

PR egg
🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?

The egg game starts only after the PR passes the real-behavior proof check.
Before that, no creature, rarity, or ASCII portrait is rolled. The treat waits for real proof.
This is still just collectible flavor: proof affects review readiness, not creature quality.

Real behavior proof
Needs real behavior proof before merge: Mock-only: the PR body reports focused tests, format, and diff checks, but no live Actions output or redacted terminal/log proof for the new comment-marker path; the contributor should add real proof and update the PR body or ask for @clawsweeper re-review.

Risk before merge
Why this matters: - As written, an external PR commenter could satisfy the required proof check by posting a hidden marker with the current PR number and head SHA.

The exact-head path may fail in Actions because the script prefers an app token requested only with permission-members: read before using the job token that has issues: read.
The PR body provides tests and format checks only, not live workflow output or redacted runtime evidence that the new GitHub comment-marker path works.

Maintainer options:

Authenticate verdict markers before merge (recommended)
Require the accepted marker comment to come from the ClawSweeper app or bot identity, add a forged-contributor-comment rejection test, and fix the issue-comment token path.
Pause if comment markers are not the durable contract
If maintainers do not want issue comments to be a trusted proof source, pause or close this approach and choose another proof handoff mechanism.

Next step before merge
Protected maintainer handling and missing real behavior proof are required, and the patch needs security-sensitive proof-marker trust fixes before merge.

Security
Needs attention: Needs attention: the new proof-verdict parser trusts unauthenticated issue-comment bodies as ClawSweeper pass markers.

Review findings

[P1] Authenticate ClawSweeper verdict comments — scripts/github/real-behavior-proof-policy.mjs:252-259
[P2] Use an issue-readable token for comment fetching — scripts/github/real-behavior-proof-check.mjs:54

Review details

Best possible solution:

Keep the exact-head verdict idea, but only trust markers from the ClawSweeper GitHub App or bot identity, use a token that can read issue comments with a safe fallback, and add forged-comment plus token-path regression coverage before merge.

Do we have a high-confidence way to reproduce the issue?

Yes by source inspection and related PR timeline. Current main does not read ClawSweeper verdict comments in the required proof check, and the related PR at #83581 shows an exact-head pass marker while the Real behavior proof check remained failing.

Is this the best way to solve the issue?

No. The direction is maintainable, but this implementation must authenticate the marker source and use a token with issue-comment read permission before it is safe to merge.

Label justifications:

P1: The PR changes required proof/automerge gating and currently contains a proof-gate trust-boundary bug that should block merge.
merge-risk: 🚨 security-boundary: Merging this PR could let untrusted PR commenters spoof a ClawSweeper verdict marker and bypass required proof.
merge-risk: 🚨 automation: Merging this PR could keep proof automation failing because the new comment-read path may use a token without issue-comment permission.

Full review comments:

[P1] Authenticate ClawSweeper verdict comments — scripts/github/real-behavior-proof-policy.mjs:252-259
hasClawSweeperExactHeadProof scans every issue comment body and accepts a matching PR number plus head SHA, but it never verifies that the comment was posted by the ClawSweeper GitHub App or bot. An external contributor can post that hidden marker themselves and make the required Real behavior proof check pass without accepted proof, so the marker should be accepted only from a trusted ClawSweeper comment and covered with a forged-comment regression test.
Confidence: 0.96
[P2] Use an issue-readable token for comment fetching — scripts/github/real-behavior-proof-check.mjs:54
The new comment fetch prefers appToken whenever it exists, but the workflow creates that app token with only permission-members: read; the added job-level issues: read applies to GITHUB_TOKEN, which this code will skip. In the intended exact-head fallback path, a members-only app token can make the check throw on the comments API instead of reading the marker, so request permission-issues: read for the app token or retry with GITHUB_TOKEN on permission failure.
Confidence: 0.88

Overall correctness: patch is incorrect
Overall confidence: 0.91

Security concerns:

[high] Contributor comments can forge proof verdicts — scripts/github/real-behavior-proof-policy.mjs:252
The parser accepts any comment containing a matching clawsweeper-verdict:pass marker, so a PR commenter can bypass the real-behavior-proof gate by posting the hidden marker with the current head SHA.
Confidence: 0.96

What I checked:

Current-main proof check does not read verdict comments: On current main, the required proof check exits after PR-body proof or override evaluation and has no issue-comment marker path, matching the reported missing contract. (scripts/github/real-behavior-proof-check.mjs:48, 38f11a084475)
Current-main Barnacle removes sufficient proof on synchronize: Current Barnacle logic removes proof: sufficient on edited/synchronize events even when the PR body proof passed; blame ties this behavior to the existing automation implementation. (scripts/github/barnacle-auto-response.mjs:787, 67f8683ca30a)
PR parser trusts all comment bodies: The PR head scans every comment body for clawsweeper-verdict:pass and accepts a matching PR number plus SHA without checking the comment author or GitHub App identity. (scripts/github/real-behavior-proof-policy.mjs:252, 1ad980ae7421)
PR check prefers app token for comment reads: The PR head uses appToken || process.env.GITHUB_TOKEN before fetching issue comments, while the workflow's app token step requests only permission-members: read. (scripts/github/real-behavior-proof-check.mjs:54, 1ad980ae7421)
Workflow permission context: The PR adds job-level issues: read, but the app-token creation steps still request only permission-members: read, so the script's token preference matters. (.github/workflows/real-behavior-proof.yml:21, 1ad980ae7421)
GitHub App token contract: The actions/create-github-app-token documentation says permission-<permission> inputs explicitly grant selected permissions; this supports treating the members-only app token as insufficient for issue-comment reads unless issues permission is also requested.

Likely related people:

steipete: Git blame attributes the current real-behavior-proof policy, workflow skeleton, and Barnacle proof-label removal logic to Peter Steinberger's automation commit. (role: introduced current proof automation surface; confidence: high; commits: 67f8683ca30a; files: scripts/github/real-behavior-proof-policy.mjs, scripts/github/real-behavior-proof-check.mjs, scripts/github/barnacle-auto-response.mjs)
Dallin Romney: Recent history shows Dallin Romney added the maintainer-team app-token path in the same real-behavior-proof workflow, which is adjacent to this PR's comment-fetch token behavior. (role: recent area contributor; confidence: medium; commits: cf194419c315; files: .github/workflows/real-behavior-proof.yml)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 38f11a084475.

fix(ci): honor exact-head proof verdicts

1ad980a

openclaw-barnacle Bot added scripts Repository scripts size: S maintainer Maintainer-authored PR labels May 18, 2026

Takhoffman merged commit e4fba78 into main May 18, 2026
106 of 111 checks passed

clawsweeper Bot added the rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. label May 18, 2026

Takhoffman deleted the fix/exact-head-proof-verdicts branch May 18, 2026 16:39

Takhoffman mentioned this pull request May 18, 2026

fix(ci): authenticate proof verdict markers #83692

Merged

25 tasks

github-actions Bot mentioned this pull request May 18, 2026

📡 Upstream Digest — 2026-05-18 17:53 UTC curtismercier/openclaw-mods#891

Open

clawsweeper Bot mentioned this pull request May 23, 2026

fix(github): preserve sufficient proof against negative relabel #85567

Merged

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026

fix(ci): honor exact-head proof verdicts (openclaw#83688)

30c09af

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026

fix(ci): honor exact-head proof verdicts (openclaw#83688)

84d262b

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 24, 2026

fix(ci): honor exact-head proof verdicts (openclaw#83688)

db72d89

github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026

fix(ci): honor exact-head proof verdicts (openclaw#83688)

4d197b1

galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026

fix(ci): honor exact-head proof verdicts (openclaw#83688)

cb072f0

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026

fix(ci): honor exact-head proof verdicts (openclaw#83688)

d11878e

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026

fix(ci): honor exact-head proof verdicts (openclaw#83688)

3b77dc2

SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request May 26, 2026

fix(ci): honor exact-head proof verdicts (openclaw#83688)

53bb3f9

jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026

fix(ci): honor exact-head proof verdicts (openclaw#83688)

5be2b72

SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026

fix(ci): honor exact-head proof verdicts (openclaw#83688)

764878b

sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026

fix(ci): honor exact-head proof verdicts (openclaw#83688)

49b774b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(ci): honor exact-head proof verdicts#83688

fix(ci): honor exact-head proof verdicts#83688
Takhoffman merged 1 commit into
mainfrom
fix/exact-head-proof-verdicts

Takhoffman commented May 18, 2026

Uh oh!

clawsweeper Bot commented May 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Takhoffman commented May 18, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Real behavior proof (required for external PRs)

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Uh oh!

clawsweeper Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

clawsweeper Bot commented May 18, 2026 •

edited

Loading