-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
Bug: sessions_yield can leave parent session transcript lock held, causing subagent completion callback timeout #85953
Copy link
Copy link
Closed
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Summary
When a parent run starts a subagent and then calls
sessions_yield, the subagent completion callback may fail to resume/write back to the yielded parent session because the parent session transcript lock remains held by the live gateway process.The observed error is:
This causes subagent completion events/callback messages to be missed by the parent session.
Environment
2026.5.22v24.15.0openclaw/openclaw/root/.openclaw/agents/main/sessionsWhat happened
A parent session spawned a subagent and used
sessions_yieldto wait for completion. After the yield/abort path, the session lock file remained present. The lock payload showed that the owner was the still-running gateway PID, not a dead process:{ "pid": <gateway-pid>, "createdAt": "2026-05-24T05:04:42.710Z", "maxHoldMs": 1020000, "starttime": <gateway-starttime> }psconfirmed the PID was the active gateway process:The affected session transcript ended around a
sessions_yieldcall followed by an assistant message withstopReason: "aborted". After that, subagent completion/resume writes contended on the same.jsonl.lockand timed out after the default 60000ms acquire timeout.Expected behavior
After
sessions_yieldcauses the parent run to yield/abort for subagent completion, the parent session transcript write lock should be released reliably so that the completion event can be written back to the yielded parent session.Actual behavior
The gateway process can keep the parent session
.jsonl.lockheld in-process after thesessions_yieldabort path. Since the owner PID is alive and recognized as OpenClaw, other writers do not reclaim the lock. They wait untilsession.writeLock.acquireTimeoutMsand then fail withSessionWriteLockTimeoutError.Suspected cause
The lock lifecycle around the embedded attempt controller appears to lack a final unconditional release for all exit paths.
Relevant code areas observed in the built package:
createEmbeddedAttemptSessionLockController(...)releaseForPrompt()/reacquireAfterPrompt()yieldAbortedbranch forsessions_yieldacquireForCleanup(...)/cleanupEmbeddedAttemptResources(...)session-write-lockwatchdog behaviorThe current flow appears to allow a lock to be reacquired/held after the prompt abort/yield path, then not released if the run exits through a particular aborted/yielded path. Because
maxHoldMsmay be extended (observed1020000ms), the watchdog does not release it before subagent completion callbacks hit the 60000ms acquire timeout.Suggested fix direction
Add a defensive final-release path to the embedded attempt session lock controller, for example:
forceReleaseHeldLock().finallyblock so every abort/error/yield path releases the transcript lock.sessions_yieldtransitions to its waiting state only after the parent session write lock has been released.maxHoldMsfor locks held around yield state, since subagent completion needs to write back much sooner than a long run timeout.Workaround
Avoid
sessions_yieldfor now. Let subagents write results to files or retrieve child results viasessions_history/manual inspection instead of relying on push-back completion into the parent session.Related issues
This may overlap with or be related to:
subagent completion spawns a fresh run on the parent's route instead of resuming the yielded session)Multi-agent orchestration is unstable... session-lock failures...)