fix(agents): release session lock on manual abort by steipete · Pull Request #88623 · openclaw/openclaw

steipete · 2026-05-31T12:59:15Z

Summary

Release the embedded attempt session lock on manual aborts through the same best-effort abort cleanup path used by timeout aborts.
Keep timeout abandonment marking scoped to timeout aborts, so manual aborts only release the retained lock.
Add focused regression coverage for manual abort lock release and release failures.

Verification

node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt-abort.test.ts src/agents/embedded-agent-runner/run/attempt.session-lock.test.ts src/agents/embedded-agent-runner/run/attempt.subscription-cleanup.test.ts test/scripts/plugin-prerelease-test-plan.test.ts
pnpm check:test-types
pnpm exec oxfmt --check --threads=1 src/agents/embedded-agent-runner/run/attempt.ts src/agents/embedded-agent-runner/run/attempt-abort.ts src/agents/embedded-agent-runner/run/attempt-abort.test.ts
git diff --check
.agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main

Real behavior proof

Behavior addressed: manual embedded-attempt aborts release the session lock retained while the attempt owns the session, preventing later turns from getting stuck behind the abandoned lock.
Real environment tested: local source checkout on Node 24-era toolchain, plus GitHub PR CI on the branch head.
Exact steps or command run after this patch: focused Vitest for abort/session-lock cleanup, pnpm check:test-types, oxfmt check, git diff --check, and branch autoreview.
Evidence after fix: attempt-abort.test.ts proves manual abort invokes releaseHeldLockForAbort() and logs release failures without throwing; existing session-lock/subscription cleanup tests still pass.
Observed result after fix: manual abort cleanup path now schedules the lock release independently from timeout-only abandonment marking.
What was not tested: live multi-agent queue reproduction against a real stuck local session lock.

clawsweeper · 2026-05-31T13:01:26Z

Codex review: needs real behavior proof before merge. Reviewed May 31, 2026, 9:52 AM ET / 13:52 UTC.

Summary
The branch extracts abort session-lock release into a helper, calls it for both manual and timeout embedded attempt aborts, and adds focused helper tests.

PR surface: Source +20, Tests +44. Total +64 across 3 files.

Reproducibility: yes. from source inspection: current main's abortRun releases the retained lock only in the timeout branch, while manual abort paths call it without timeout. I did not run a live stuck-lock reproduction during this read-only review.

Review metrics: none identified.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

[P2] Add redacted real behavior proof showing manual stop followed by immediate retry without waiting for stale lock timeout.
[P1] Ask maintainers to choose this PR or Release session lock on user abort #88622 as the canonical fix before landing.

Proof guidance:

[P1] Needs real behavior proof before merge: The PR body provides focused tests and CI-style checks, and explicitly says no live multi-agent queue reproduction was run; add redacted terminal/log/live output or a short manual-stop-then-retry recording before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

[P1] The branch still has mock-only proof: the PR body lists focused tests/CI and explicitly says no live multi-agent queue reproduction was run.
[P1] Release session lock on user abort #88622 is an open parallel fix for the same linked behavior with stronger run-level proof, so maintainers need one canonical landing path.
[P2] Abort lock-release ordering is session-state sensitive; the final patch must keep timeout abandoned-run marking timeout-only while releasing the retained lock for every abort.

Maintainer options:

Choose the canonical proof-backed patch (recommended)
Land either this PR after adding real abort-and-retry proof or the sibling proof-positive PR, not both, while preserving timeout-only abandonment marking.
Accept helper-level proof intentionally
Maintainers may land this helper-based patch with explicit acceptance that the real stuck-lock user path was not demonstrated on this branch.
Close the duplicate after one lands
If the sibling patch lands first, close this PR as superseded rather than carrying a second abort-lock implementation.

Next step before merge

[P1] Protected maintainer labeling, missing real behavior proof, and the open sibling PR make this a maintainer canonical-landing choice rather than an automated repair lane.

Security
Cleared: The diff only changes embedded-runner abort cleanup code and tests; it does not touch dependencies, workflows, secrets, permissions, or package execution surfaces.

Review details

Best possible solution:

Land one canonical abort/session-lock fix that releases the retained lock on every abort, keeps timeout abandonment timeout-only, and includes real abort-and-retry proof.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection: current main's abortRun releases the retained lock only in the timeout branch, while manual abort paths call it without timeout. I did not run a live stuck-lock reproduction during this read-only review.

Is this the best way to solve the issue?

Partially: moving lock release outside the timeout-only branch is the right fix shape, and the helper matches the scoped test guidance. This branch should still gain real abort-and-retry proof or defer to the sibling run-level patch before merge.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 7dea2837565e.

Label changes

Label changes:

add rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🐚 platinum hermit.
remove rating: 🦪 silver shellfish: Current PR rating is rating: 🧂 unranked krab, so this older rating label is no longer current.

Label justifications:

P1: The linked bug can leave agent turns blocked behind a leaked session lock after manual abort, which is an urgent broken agent workflow.
merge-risk: 🚨 session-state: The patch changes abort-time retained session-lock release ordering, which can affect whether later turns see stale, released, or mis-owned session state.
rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🐚 platinum hermit.
status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body provides focused tests and CI-style checks, and explicitly says no live multi-agent queue reproduction was run; add redacted terminal/log/live output or a short manual-stop-then-retry recording before merge. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Evidence reviewed

PR surface:

Source +20, Tests +44. Total +64 across 3 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	2	25	5	+20
Tests	1	44	0	+44
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	3	69	5	+64

What I checked:

Current main bug surface: On current main, abortRun only calls releaseHeldLockForAbort() inside the isTimeout && queueHandleForAbandonment branch, while manual abort callers reach abortRun() or abortRun(false, reason). (src/agents/embedded-agent-runner/run/attempt.ts:2987, 7dea2837565e)
PR implementation path: The PR head moves the release helper call outside the timeout-only abandonment branch and passes abortKind based on timeout vs manual abort. (src/agents/embedded-agent-runner/run/attempt.ts:2989, 56fa5420d627)
PR proof is helper-level: The added test proves the helper invokes a mocked releaseHeldLockForAbort() and logs mocked failures, but it does not exercise the runner cleanup/next-turn lock behavior with a real session lock. (src/agents/embedded-agent-runner/run/attempt-abort.test.ts:5, 56fa5420d627)
Parallel canonical candidate: The related open PR at Release session lock on user abort #88622 changes the same abort path and includes a run-level regression test proving retained release before cleanup reacquire. (src/agents/embedded-agent-runner/run/attempt.user-abort-lock.test.ts:21, eeff9c42fed7)
Current-main merge shape: A three-way merge against current main applies this PR without textual conflict and changes only the helper, helper test, and abort call site. (56fa5420d627)
History provenance: Blame ties the current abort-run and lock-release implementation to the stale-session runner lifecycle commit, with later current-main touches in the same file from subagent queue work. (src/agents/embedded-agent-runner/run/attempt.ts:2964, e8c7c933f8c1)

Likely related people:

brokemac79: git blame attributes the current abortRun block and releaseHeldLockForAbort controller method to the stale-session runner lifecycle commit. (role: introduced behavior; confidence: high; commits: e8c7c933f8c1; files: src/agents/embedded-agent-runner/run/attempt.ts, src/agents/embedded-agent-runner/run/attempt.session-lock.ts)
Peter Steinberger: Recent current-main commits touched attempt.ts and adjacent subagent queue behavior, making him a likely routing candidate for the embedded-runner/session-state surface. (role: recent area contributor; confidence: medium; commits: f24a13879095, 1e54e908e2e4; files: src/agents/embedded-agent-runner/run/attempt.ts, src/agents/subagent-delivery-state.ts, src/agents/subagent-registry.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

…unsettled retained writes (#6) A turn timeout aborts the run, but releaseHeldLockForAbort delegated to the graceful fence release, whose waitForRetainedLockIdle is unbounded. A retained transcript write pinned behind a hung provider stream (one that ignores abort, e.g. a stalled streamGenerateContent) therefore held the session .jsonl lock until the maxHoldMs watchdog (5-30 min), and every turn in between failed with SessionWriteLockTimeoutError. releaseHeldLockForAbort now races the graceful path against the abort settle bound (OPENCLAW_EMBEDDED_ABORT_SETTLE_TIMEOUT_MS). When the bound expires the underlying file lock is force-released and the controller poisoned via takeoverDetected, so the aborted run cannot perform torn writes after losing ownership. Retained writes that settle within the bound keep the existing graceful fence semantics. Prod incident: tulgey#225 (membrane 2026-06-01..03). Completes the lock-leak family of openclaw#87278 / openclaw#88623 / openclaw#89811, which release on settled aborts but not on aborts a hung provider call never lets settle. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

openclaw-barnacle Bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels May 31, 2026

clawsweeper Bot mentioned this pull request May 31, 2026

Release session lock on user abort #88622

Closed

steipete force-pushed the fix/manual-abort-session-lock branch from bc9d9eb to d88376e Compare May 31, 2026 13:29

clawsweeper Bot removed the merge-risk: 🚨 auth-provider 🚨 May break OAuth, tokens, provider routing, model choice, or credentials. label May 31, 2026

fix(agents): release session lock on manual abort

56fa542

steipete force-pushed the fix/manual-abort-session-lock branch from 70acd6d to 56fa542 Compare May 31, 2026 13:46

clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels May 31, 2026

steipete merged commit 95890fe into main May 31, 2026
158 of 160 checks passed

steipete deleted the fix/manual-abort-session-lock branch May 31, 2026 13:53

steipete mentioned this pull request May 31, 2026

Session file lock leak when user manually aborts agent (non-timeout abort never releases lock) #88600

Closed

matin mentioned this pull request Jun 3, 2026

fix(agents): bound abort-path session lock release; force-release on unsettled retained writes matin/openclaw#6

Merged

matin mentioned this pull request Jun 3, 2026

fix(agents): bound abort-path session lock release; force-release on unsettled retained writes #90065

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agents): release session lock on manual abort#88623

fix(agents): release session lock on manual abort#88623
steipete merged 1 commit into
mainfrom
fix/manual-abort-session-lock

steipete commented May 31, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

steipete commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Real behavior proof

Uh oh!

clawsweeper Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

steipete commented May 31, 2026 •

edited

Loading

clawsweeper Bot commented May 31, 2026 •

edited

Loading