fix(release-daemon): verify PR is actually merged on GitHub before writing "landed"#61
Merged
Merged
Conversation
…iting "landed" The daemon used to classify a sub-agent result based ONLY on exit code: exit 0 = "landed", non-zero/timed_out = "blocked". But the /land-and-deploy sub-agent (Kimi / Claude / Codex) gracefully declines to merge in real scenarios — failing CI, no PR found, pre-merge gate fails — and still exits 0 (subprocess ran cleanly, verdict lives in prose). Observed in production on 2026-05-19 (mitosis-control-plane release-queue): Queue says GitHub reality PR garrytan#120 landed OPEN, e2e failing PR garrytan#122 landed MERGED ✓ PR garrytan#124 landed MERGED ✓ PR garrytan#129 landed OPEN, e2e failing 50% false-positive rate. The queue records are stuck on "landed" so discoverQueuedRecords (which filters status === "queued") never re-fans out for them — silently abandoned. Fix: after the land sub-agent returns exit 0, ask GitHub for the authoritative PR state. New helper `verifyPrMerged(prNumber, repoIdentity, cwd)` runs `gh pr view <pr> --repo <owner/name> --json state -q .state` and returns { merged: true } only on MERGED. Anything else (OPEN, CLOSED, gh exited non-zero, repoIdentity unparseable) returns { merged: false, reason: "..." } and the daemon writes "blocked" with a useful lastError. The repoIdentity is parsed via a strict ^github.com/owner/repo$ regex so a planted record can't sneak shell-specials through to gh — anything that doesn't match the regex gets rejected before any subprocess runs. Network/auth failures (gh exited non-zero) are reported as not-merged rather than thrown — better to block the record and retry than crash the daemon and leak the lock. The remediation for any blocked record is the existing retryReleaseQueueRecord CLI. Injection seam: opts.verifyMerged?: typeof verifyPrMerged so tests can control the response. Defaults to the real verifyPrMerged. Scope: pre-existing bug, not a regression from PR #57. PR garrytan#120 was marked landed at 12:30Z on 2026-05-19, hours before PR #57's work began. PR #57's deploy verification surfaced it by running through a real PR (garrytan#129) with failing e2e. Tests: 60/60 release-daemon.test.ts (5 new regressions covering the production bug pattern, the happy path, network failures, identity parsing, and the positive parse case). Full orchestrator suite: 1525 pass / 16 fail, all 16 pre-existing on origin/main (confirmed via git stash). Remediation for stranded records on production right now (PR#120, PR#129 on mitosis-control-plane): once this lands, the daemon needs manual revival of those two records. The retryReleaseQueueRecord CLI only handles blocked→queued, not landed→queued. Suggest a one-shot audit script or extending the retry CLI to support landed records whose GitHub state disagrees. Out of scope for this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Symptom
The release-daemon writes
status: "landed"to its queue records for PRs that are stillOPENon GitHub. Observed in production on 2026-05-19 (mitosis-control-plane release-queue):50% false-positive rate. The queue records are stuck on
landedsodiscoverQueuedRecords(which filters bystatus === "queued") never re-fans out for them — silently abandoned, lock released, no remote indication of failure.Root cause
processReleaseQueueRecordat release-daemon.ts:945-951 classified the sub-agent result based ONLY on exit code:But the
/land-and-deploysub-agent (Kimi / Claude / Codex) gracefully declines to merge in real scenarios — failing CI, no PR found, pre-merge readiness gate fails — and still exits 0 (the subprocess itself ran cleanly; the verdict lives in prose inside the output file).PR#129's sub-agent transcript (
~/.gstack/build-state/release-daemon-pr-129/land-and-deploy-output.md):Subprocess log:
exit: 0,timed_out: false. Daemon writes"landed".Fix
After the land sub-agent returns exit 0, ask GitHub for the authoritative state.
New helper
verifyPrMerged(prNumber, repoIdentity, cwd)runsgh pr view <pr> --repo <owner/name> --json state -q .state. Returns{ merged: true }only when state isMERGED; anything else →{ merged: false, reason: "..." }and the daemon writes"blocked"with a usefullastError.Security:
repoIdentityis parsed via a strict^github\.com/([^/]+/[^/]+)$regex before being passed to--repo. A planted record can't sneak shell-specials through togh.Robustness: network/auth failures (
ghexits non-zero) are reported asmerged: falserather than thrown — better to block the record and retry than crash the daemon and leak the lock.Injection seam:
opts.verifyMerged?: typeof verifyPrMergedso tests can stub the response. Defaults to the realverifyPrMerged.Scope
Pre-existing bug, NOT a regression from PR #57. PR garrytan#120 was marked
landedat 12:30:23Z on 2026-05-19, hours before any work on PR #57 began. The bug has been latent in the daemon since/land-and-deployfirst started being invoked as a sub-agent. PR #57's deploy verification surfaced it by running through a real PR (garrytan#129) whose CI was failing.Tests
release-daemon.test.ts: 60/60 passing (5 new regression tests):Full orchestrator suite: 1525 pass / 16 pre-existing failures (all 16 confirmed pre-existing on origin/main via
git stash --include-untracked --keep-index+ re-run — these areskill-md.test.tschecks against templates and one timeout inintegration.test.ts; unrelated to this PR).Production remediation for stranded records
Once this lands, the daemon still needs manual revival of PR#120 and PR#129 on mitosis-control-plane (their queue records say
landedbut they're reallyOPEN). The existingretryReleaseQueueRecordCLI only handlesblocked → queued, notlanded → queued.Two options for follow-up (out of scope for this PR):
landedrecord, cross-check against GitHub, revive disagreeing ones toqueuedretryCLI to accept--from-landedfor records whose GitHub state disagreesTest plan
~/.gstack/build-state/release-queue/to find alllandedrecords that disagree with GitHub state. Revive them via direct disk edit (setstatus: "queued", clearlastUpdatedAt).🤖 Generated with Claude Code