-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
[Bug]: Session file lock not released properly by watchdog #87483
Copy link
Copy link
Open
Labels
P2Normal backlog priority with limited blast radius.Normal backlog priority with limited blast radius.bugSomething isn't workingSomething isn't workingclawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Metadata
Metadata
Assignees
Labels
P2Normal backlog priority with limited blast radius.Normal backlog priority with limited blast radius.bugSomething isn't workingSomething isn't workingclawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Bug type
Behavior bug (incorrect output/state without crash)
Beta release blocker
No
Summary
Session write lock files persist beyond maxHoldMs timeout; watchdog fails to reclaim stale locks, causing "session file locked" errors on subsequent requests.
Steps to reproduce
Expected behavior
Lock files should be automatically released when:
The watchdog should reclaim stale locks without manual intervention.
Actual behavior
Lock files persist for 8+ hours despite maxHoldMs being 300000ms (5 minutes)
Lock file shows maxHoldMs: 1020000 (17 minutes) instead of configured 300000ms
Watchdog does not reclaim stale locks automatically
User must manually delete lock files or restart gateway to resolve
Error message: "session file locked (timeout 60000ms): pid=16834 /path/to/session.jsonl.lock"
OpenClaw version
2026.5.22 (a374c3a)
Operating system
macOS Darwin 25.5.0 (arm64)
Install method
npm global
Model
qwen/kimi-k2.5
Provider / routing chain
openclaw -> modelstudio/qwen3.5-plus
Additional provider/model setup details
No response
Logs, screenshots, and evidence
Lock file content: { "pid": 16834, "createdAt": "2026-05-28T01:12:54.261Z", "maxHoldMs": 1020000 } Process status: pid 16834 83.3% CPU openclaw gateway (Process running for 170+ minutes, lock held for 8+ hours) Configuration: - session.writeLock.acquireTimeoutMs: 60000 - session.writeLock.staleMs: 1800000 - session.writeLock.maxHoldMs: 300000Impact and severity
Affected: All OpenClaw users on 2026.5.22 with long-running gateway
Severity: Medium (requires manual intervention or workaround)
Frequency: Observed multiple times after overnight operation
Consequence: Agents fail to respond, user must manually delete lock files or restart gateway
Additional information
Workaround applied:
Extended timeouts via environment variables:
Created cleanup script via crontab to remove stale locks every 10 minutes
Possible root causes: