-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
Session file lock leak when user manually aborts agent (non-timeout abort never releases lock) #88600
Copy link
Copy link
Closed
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.ClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.ClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Description
When a user manually stops an agent, the session file lock is never released, causing subsequent turn attempts to fail with
SessionWriteLockTimeoutErrorfor 60+ seconds.Steps to Reproduce
Expected Behavior
Agent should recover and accept new messages immediately after abort.
Actual Behavior
Root Cause
File:
selection-C4e-Qn9W.js(bundled), functionabortRun:releaseHeldLockForAbort()is guarded byif (isTimeout). When user aborts manually,isTimeout=false, so the lock is never released.Failure Chain
abortRun(isTimeout=false)→releaseHeldLockForAbort()SKIPPED → lock remains heldacquireForCleanup()→acquireCleanupLock()→takeHeldLockAfterRetainedIdle()fails (lock in use)acquireLock()→ lock already held → waits 60s →SessionWriteLockTimeoutErrorcleanupEmbeddedAttemptResources()never reached → lock NEVER releasedmaxHoldMs, default 300s) or gateway restartRelevant Logs
Suggested Fix
Environment