Release session lock on user abort#88622
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 31, 2026, 9:33 AM ET / 13:33 UTC. Summary PR surface: Source -2, Tests +76. Total +74 across 2 files. Reproducibility: yes. Current main shows the release call guarded by the timeout branch while manual aborts call Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land one canonical narrow fix that releases the retained lock for all abort kinds while keeping abandoned-run tracking timeout-only. Do we have a high-confidence way to reproduce the issue? Yes. Current main shows the release call guarded by the timeout branch while manual aborts call Is this the best way to solve the issue? Yes. Moving the release call out of the timeout-only branch while keeping abandoned-run marking inside the timeout branch is the narrowest maintainable fix; maintainers just need to choose this branch or the sibling branch. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 9518d1f27ccd. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source -2, Tests +76. Total +74 across 2 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
29cb280 to
49fc82c
Compare
|
Rebased this PR onto current
Verification after the rebase: node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.user-abort-lock.test.ts src/agents/embedded-agent-runner/run/attempt.session-lock.test.ts src/agents/embedded-agent-runner/runs.test.ts src/auto-reply/reply/reply-run-registry.test.ts src/auto-reply/reply/abort.test.ts
# 5 files passed, 102 tests passed
git diff --check origin/main...HEAD -- src/agents/embedded-agent-runner/run/attempt.ts src/agents/embedded-agent-runner/run/attempt.user-abort-lock.test.ts
# passedI also tried the targeted lint wrapper: node scripts/run-oxlint.mjs src/agents/embedded-agent-runner/run/attempt.ts src/agents/embedded-agent-runner/run/attempt.user-abort-lock.test.tsThat currently stops before linting the touched files on an unrelated plugin-SDK boundary DTS error in @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
49fc82c to
eeff9c4
Compare
|
Rebased again onto current Current diff remains limited to:
Verification after this rebase: node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.user-abort-lock.test.ts src/agents/embedded-agent-runner/run/attempt.session-lock.test.ts src/agents/embedded-agent-runner/runs.test.ts src/auto-reply/reply/reply-run-registry.test.ts src/auto-reply/reply/abort.test.ts
# 5 files passed, 102 tests passed
pnpm check:architecture
# passed; Kysely guardrails OK
node scripts/run-oxlint.mjs src/agents/embedded-agent-runner/run/attempt.ts src/agents/embedded-agent-runner/run/attempt.user-abort-lock.test.ts
# passed
git diff --check origin/main...HEAD -- src/agents/embedded-agent-runner/run/attempt.ts src/agents/embedded-agent-runner/run/attempt.user-abort-lock.test.ts
# passedThis branch remains the narrow two-file session-lock fix with no device-auth/parser side change. @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Thanks @rohitjavvadi. This is now covered on I landed the same fix shape there: manual/user aborts release the retained embedded-session lock through the abort lock-release path, while timeout abandoned-run tracking stays timeout-only. I also kept the regression coverage focused around the shared abort cleanup helper and verified full PR CI on 56fa542. Closing this as superseded by the landed PR so we do not keep two copies of the same session-lock change open. |
Summary
Fixes #88600
Motivation / Root Cause
abortRun(false)aborted compaction and the active session, but the retained session write lock release lived inside the timeout-only branch. A manual abort could leave the original attempt's lock held until teardown, so cleanup or the next turn could block behind the same run's stale held lock.User-Visible Behavior
Manual stop/interrupt of an embedded run should release the session write lock promptly, allowing cleanup and follow-up work for the same session to proceed without waiting for stale lock timeout recovery.
Real Behavior Proof
Behavior addressed: manual/user abort of an embedded run now releases the retained session write lock before cleanup reacquires a lock for the same session.
Real environment tested: local macOS checkout using the real session-write-lock implementation and a temp persisted session file on disk.
Exact steps or command run after this patch:
Evidence after fix:
Observed result after fix: before the abort-release call, a cleanup-style lock acquire against the same persisted session file timed out on the live lock; after
releaseHeldLockForAbort, cleanup acquired and released the lock, and the lock file was gone. Focused Vitest passed 5 files / 102 tests; targeted lint passed;git diff --checkpassed.What was not tested: no live gateway/manual-stop smoke or channel delivery smoke was run.
Compatibility and Risks
No config, API, persistence format, provider, or channel contract changes. The risk is limited to abort/session-lock lifecycle ordering; the release helper is already used for timeout aborts and yield-abort cleanup, and timeout abandoned-run marking remains timeout-only.