fix(ci): honor exact-head proof verdicts#83688
Conversation
|
Codex review: needs real behavior proof before merge. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. by source inspection and related PR timeline. Current main does not read ClawSweeper verdict comments in the required proof check, and the related PR at #83581 shows an exact-head pass marker while the Real behavior proof check remained failing. PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. PR egg Where did the egg go?
Real behavior proof Risk before merge
Maintainer options:
Next step before merge Security Review findings
Review detailsBest possible solution: Keep the exact-head verdict idea, but only trust markers from the ClawSweeper GitHub App or bot identity, use a token that can read issue comments with a safe fallback, and add forged-comment plus token-path regression coverage before merge. Do we have a high-confidence way to reproduce the issue? Yes by source inspection and related PR timeline. Current main does not read ClawSweeper verdict comments in the required proof check, and the related PR at #83581 shows an exact-head pass marker while the Real behavior proof check remained failing. Is this the best way to solve the issue? No. The direction is maintainable, but this implementation must authenticate the marker source and use a token with issue-comment read permission before it is safe to merge. Label justifications:
Full review comments:
Overall correctness: patch is incorrect Security concerns:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 38f11a084475. |
Summary
Describe the problem and fix in 2-5 bullets:
proof: sufficientafter ClawSweeper accepts real-behavior proof because Barnacle only trusts PR-body proof orproof: override.Real behavior proofcheck does not consume ClawSweeper's exact-head verdict.clawsweeper-verdict:passmarkers only when they match the PR number and current head SHA; Barnacle preservesproof: sufficientfor that exact-head verdict, and the required proof check accepts it.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Real behavior proof (required for external PRs)
External contributors must show after-fix evidence from a real OpenClaw setup. Unit tests, mocks, lint, typechecks, snapshots, and CI are supplemental only. Screenshots are encouraged even for CLI, console, text, or log changes; terminal screenshots and copied live output count. Be mindful of private information like IP addresses, API keys, phone numbers, non-public endpoints, or other private details when providing evidence.
pnpm test test/scripts/real-behavior-proof-policy.test.ts test/scripts/barnacle-auto-response.test.ts;pnpm exec oxfmt --check --threads=1 .github/workflows/real-behavior-proof.yml scripts/github/real-behavior-proof-policy.mjs scripts/github/real-behavior-proof-check.mjs scripts/github/barnacle-auto-response.mjs test/scripts/real-behavior-proof-policy.test.ts test/scripts/barnacle-auto-response.test.ts;git diff --check origin/main...HEAD.proof: sufficienton synchronize when the marker is valid.proof: sufficientafter ClawSweeper applied it, whileReal behavior proofremained failing.Root Cause (if applicable)
For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write
N/A. If the cause is unclear, writeUnknown.Real behavior proofworkflow used the structured PR body proof policy but did not read ClawSweeper's durable exact-head verdict marker.synchronize.proof: sufficienton PR updates, but it had no way to distinguish stale labels from an exact-head ClawSweeper pass.Regression Test Plan (if applicable)
For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write
N/A.test/scripts/real-behavior-proof-policy.test.ts;test/scripts/barnacle-auto-response.test.tsproof: sufficienton synchronize when that marker exists.User-visible / Behavior Changes
Automerge/proof automation can now proceed from an exact-head ClawSweeper proof verdict instead of racing over
proof: sufficientlabel churn.Diagram (if applicable)
Security Impact (required)
Yes/No) YesYes/No) NoYes/No) YesYes/No) NoYes/No) YesYes, explain risk + mitigation: The proof workflow reads PR issue comments using the existing app token so it can find ClawSweeper's marker. The marker is accepted only when the PR number and exact 40-character head SHA match, limiting stale or copied-marker risk.Repro + Verification
Environment
Steps
proof: sufficient.Expected
proof: sufficientonly when the exact-head marker exists.Actual
Evidence
Attach at least one:
Human Verification (required)
What you personally verified (not just CI), and how:
Review Conversations
If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.
Compatibility / Migration
Yes/No) YesYes/No) NoYes/No) NoRisks and Mitigations