fix(ci): prevent zombie sticky comments on CCR timeout or race#860
Conversation
Two failure modes left "In Progress" Claude Code Review comments stuck on PRs (e.g. PR bsv-blockchain#845 had 3 stale stubs): 1. No workflow concurrency — overlapping CCR runs each created their own sticky comment because each LLM independently ran the "find existing" search and missed the other's just-created comment. 2. No timeout cleanup — when the Claude job hit a timeout or runtime error after creating the sticky but before marking Complete, the stub stayed forever. Fix: - Add a workflow-level concurrency group keyed on PR number (with head_branch fallback for fork PRs that get an empty pull_requests array on workflow_run). cancel-in-progress: true ensures only one CCR run per PR is alive at a time; the replacement run finds the existing sticky and updates it in-place. - Add a post-step that runs on failure() and PATCHes any remaining "In Progress" Claude sticky comments to an "Aborted" state, with a link to the failed workflow run. Concurrency-cancellations are intentionally excluded — the replacement run handles those.
|
🤖 Claude Code Review Status: Complete Current Review: No blocking issues found. The PR correctly addresses the zombie comment problem described in #845. Minor observations (already acknowledged in PR description):
Code quality: The implementation is sound. The Verification: The PR description notes |
|
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-05-13 13:43 UTC |



Problem
On PR #845, three stale "🤖 Claude Code Review — In Progress" comments accumulated over multiple CCR runs. Each had only the early checkboxes ticked; none reached "Complete". Two root causes in
.github/workflows/claude-code-review.yml:Fix
A. Workflow-level concurrency group
Ensures only one CCR run per PR is alive at a time. The replacement run finds the existing sticky and updates it in-place, so concurrency-cancellations don't leave zombies.
head_branchfallback covers fork PRs whereworkflow_run.pull_requestsis empty.C. Post-step finalizer on failure
A
failure()-gated step that queries for any "🤖 Claude Code Review" comments containing "In Progress" and PATCHes them to an "Aborted" body with a link to the failed run URL. Only true failures (timeout, API error, runner death) trigger it — concurrency-cancellations are deliberately excluded because the queued replacement run handles those.Why not also B (pre-step ID resolution) and D (cleanup-at-start)?
A + C addresses the two observed failure modes. B and D were considered but skipped for now:
Risks
head_branchfallback collisions on fork PRs. Two fork PRs from different forks but with the same branch name would share a concurrency group. Worst case: one run cancels an unrelated PR's review, which then retriggers on the next push. Acceptable for a first iteration; the alternative (collapsing fork PRs into the same group) is no worse than the status quo.failure()only — cancellation does not trigger the finalizer, so this race doesn't occur in practice.gh api -f body=multi-line handling.ghconverts-f key=valueto JSON for PATCH; multi-line shell variables embed real newlines that gh serialises as\n. Confirmed working in similar workflows; if a future GH CLI release changes this, the worst case is an ugly-looking aborted comment, not a workflow failure (the|| trueguards against errors).Verification
actionlint .github/workflows/claude-code-review.yml— clean.YAML.load_file).Not in scope
.github/workflows/sonar-pr-analyze.yamldoes not post sticky comments, so it doesn't share this bug. No change needed.