-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
Make session write lock configurable + narrow lock scope (avoid timeout=All models failed) #13744
Copy link
Copy link
Closed
Labels
P2Normal backlog priority with limited blast radius.Normal backlog priority with limited blast radius.clawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.ClawSweeper found an open linked pull request for this issue.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.enhancementNew feature or requestNew feature or requestimpact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.
Metadata
Metadata
Assignees
Labels
P2Normal backlog priority with limited blast radius.Normal backlog priority with limited blast radius.clawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.ClawSweeper found an open linked pull request for this issue.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.enhancementNew feature or requestNew feature or requestimpact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Problem
OpenClaw uses a file lock
${sessionFile}.lockfor session writes. In current build (npmopenclaw@2026.2.9), the lock defaults are:timeoutMs = 10000(10s)staleMs = 30minHowever, call sites invoke it without overrides:
acquireSessionWriteLock({ sessionFile })so
timeoutMs/staleMsare effectively hardcoded defaults and not configurable viaopenclaw.json.Additionally, the lock appears to be held across a broad portion of the embedded run (including LLM/tool execution), not just during the actual transcript append/flush. Under concurrent inbound messages to the same session, this produces:
Error: session file locked (timeout 10000ms)All models failed/ failover, even though this is not a model/provider error.Why it matters
If two channels (e.g., Telegram + Webchat) end up hitting the same session concurrently, the second request times out after 10s and fails the run, causing user-visible outages. This is concurrency/locking contention, not provider failure.
Requested changes
Configurable lock timeouts
sessionWriteLock.timeoutMsandsessionWriteLock.staleMsviaopenclaw.json(global defaults), and/or via env.Narrow lock scope
Better error handling
409/429-style response with “please retry” instead of surfacingAll models failed.Evidence (from dist bundle)
In
dist/reply-*.js:acquireSessionWriteLock()includetimeoutMs ?? 1e4andstaleMs ?? 1800*1e3.acquireSessionWriteLock({ sessionFile })(no overrides).Workarounds we are using
.jsonl.lockfiles.Happy to provide exact line snippets from the bundle if you want, or test a PR.