Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
OpenClaw 2026.5.18 can finish an Anthropic/Opus Discord run, enter post-run auto-compaction, and then leave the session JSONL write lock held after the compaction path times out.
After that, every new request in the same Discord channel session waits 60000ms for the session file lock and fails before the agent can reply:
SessionWriteLockTimeoutError: session file locked (timeout 60000ms)
The only observed recovery was a Gateway restart, which removed the live lock state and allowed the channel to accept requests again.
This appears related to existing session-lock/event-loop/compaction reliability reports, but this reproduction is narrower: a successful Opus run is followed by auto-compaction that holds the same session JSONL lock long enough to make all subsequent channel turns fail with no useful in-channel recovery.
Steps to reproduce
- Run OpenClaw Gateway as a user systemd service with Discord enabled.
- Use a Discord channel session with Anthropic Opus as the active model.
- Start a larger file-producing task so the session crosses the auto-compaction threshold.
- Let the assistant finish the requested work.
- Observe post-run auto-compaction start for the same session.
- Send another user request in the same Discord channel while the compaction path is stuck.
- Observe the new request wait for the existing JSONL lock and fail after
60000ms.
Observed reproduction:
- Discord channel:
#mws
- Session key:
agent:main:discord:channel:1506258704541159484
- Session file:
/home/casper/.openclaw/agents/main/sessions/49e71c56-dcbc-40ab-be04-4a92fd2230be.jsonl
- Lock file:
/home/casper/.openclaw/agents/main/sessions/49e71c56-dcbc-40ab-be04-4a92fd2230be.jsonl.lock
- Run id:
7d2170b5-3733-4439-8451-cad42efa577b
The final assistant answer for the original Opus run was written to the JSONL around 2026-05-19T13:33:13.804Z. The session lock was then created immediately after for the same gateway process:
{
"pid": 963591,
"createdAt": "2026-05-19T13:33:13.808Z",
"starttime": 335126537
}
Later user requests in the same channel failed at 13:45, 13:46, and 13:54 UTC while waiting for the same lock.
Expected behavior
Post-run auto-compaction should not leave a live session write lock behind after timeout or abort.
Expected behavior:
- compaction releases the session JSONL lock on success, failure, timeout, or cancellation
- subsequent user turns in the same Discord channel are not blocked by stale in-process compaction state
- if compaction cannot complete, OpenClaw surfaces a recoverable channel/session error
- a Gateway restart should not be required to make the channel usable again
- the stale-lock check should consider both PID and process start time, and should have a cleanup path for locks left by failed compaction
Actual behavior
The original Opus run completed useful work and wrote its final assistant output to the session JSONL.
Immediately afterward, auto-compaction held the session write lock. The compaction path timed out, but the lock remained held by the live Gateway process. New Discord requests in the same channel then failed before an embedded agent could start or reply.
User-visible result:
- Discord typing/traffic stops
- no final or error answer reaches the channel for later requests
- each new request waits about 60 seconds and fails
- the channel remains unusable until Gateway restart
OpenClaw version
OpenClaw 2026.5.18
Operating system
Ubuntu
Install method
npm global
Model
claude-opus-4-7
Provider / routing chain
anthropic/claude-opus-4-7 -> OpenClaw embedded run -> Discord channel session -> post-run auto-compaction
Additional provider/model setup details
Anthropic was used through the normal OpenClaw embedded runner path.
The incident happened after changing Discord group visible replies to automatic delivery to work around a separate message tool argument issue. The write-lock failure is independent of that delivery setting: the failing path is session persistence/auto-compaction before any later assistant reply can be generated.
The same environment also has separate reports for:
- Codex app-server turns stalling after
item/completed
- model-generated
SendMessage arguments being rejected instead of normalized to message
Those are distinct symptoms. This report is specifically about the session JSONL lock left behind by post-run auto-compaction.
Logs, screenshots, and evidence
Original compaction timeout signal:
May 19 13:43:45 casper node[963591]:
2026-05-19T13:43:45.077+00:00 [agent/embedded]
embedded run timeout reached during compaction; extending deadline:
runId=7d2170b5-3733-4439-8451-cad42efa577b
sessionId=49e71c56-dcbc-40ab-be04-4a92fd2230be
extraMs=900000
May 19 13:44:14 casper node[963591]:
CommandLaneTaskTimeoutError: Command lane "main" task timed out after 930000ms
Subsequent requests failed waiting for the same session JSONL lock:
May 19 13:45:17 casper node[963591]:
2026-05-19T13:45:17.707+00:00 [diagnostic]
lane task error: lane=main durationMs=61155
error="SessionWriteLockTimeoutError: session file locked (timeout 60000ms):
pid=963591 /home/casper/.openclaw/agents/main/sessions/49e71c56-dcbc-40ab-be04-4a92fd2230be.jsonl.lock"
May 19 13:45:17 casper node[963591]:
2026-05-19T13:45:17.712+00:00 [diagnostic]
lane task error: lane=session:agent:main:discord:channel:1506258704541159484 durationMs=61165
error="SessionWriteLockTimeoutError: session file locked (timeout 60000ms):
pid=963591 /home/casper/.openclaw/agents/main/sessions/49e71c56-dcbc-40ab-be04-4a92fd2230be.jsonl.lock"
May 19 13:45:17 casper node[963591]:
Embedded agent failed before reply:
session file locked (timeout 60000ms):
pid=963591 /home/casper/.openclaw/agents/main/sessions/49e71c56-dcbc-40ab-be04-4a92fd2230be.jsonl.lock
The same pattern repeated:
May 19 13:46:27 ... SessionWriteLockTimeoutError ... 49e71c56-dcbc-40ab-be04-4a92fd2230be.jsonl.lock
May 19 13:54:04 ... SessionWriteLockTimeoutError ... 49e71c56-dcbc-40ab-be04-4a92fd2230be.jsonl.lock
Lock file observed before Gateway restart:
/home/casper/.openclaw/agents/main/sessions/49e71c56-dcbc-40ab-be04-4a92fd2230be.jsonl.lock
mtime: 2026-05-19 13:33:13.807249173 +0000
pid: 963591
createdAt: 2026-05-19T13:33:13.808Z
After a Gateway restart, the lock file was gone and the channel could accept new work again:
LOCK_GONE
Related public issues found:
- https://github.com/openclaw/openclaw/issues/43367 mentions session lock timeouts and detached background work in multi-agent orchestration.
- https://github.com/openclaw/openclaw/issues/75882 mentions gateway stalls, lane waits, file lock timeouts, and missed replies.
Neither is an exact match for this post-run auto-compaction lock leak in a single Discord channel session.
Impact and severity
Severity: High / work-blocking.
Impact:
- the affected Discord channel session becomes unusable
- every new request waits about 60 seconds and fails before reply
- users see no actionable recovery message in the channel
- completed work may exist on disk, but the user receives no reliable completion signal
- the only practical recovery observed is a Gateway restart
Additional information
Immediate workaround:
- Restart the Gateway cleanly.
- Verify the affected lock file is gone.
- Retry work in the channel only after the lock is cleared.
Operational workaround until fixed:
- keep high-context Discord sessions short
- use fresh channel/session context for large site/build tasks before auto-compaction is likely
- split large tasks into smaller turns
- avoid continuing work in a session that is close to compaction/context limits
- monitor for old
*.jsonl.lock files in active session directories
- do not manually delete a lock while its owning Gateway PID is still alive unless there is strong evidence the lock is stale and the process is no longer using it
- if the lock owner is the live Gateway process and the channel is blocked, prefer a clean Gateway restart over deleting the lock file
Suggested upstream fix areas:
- ensure session write locks are released in
finally blocks around compaction
- add timeout/cancellation cleanup for compaction-held session locks
- make lock diagnostics identify the owning operation, not only the owning PID
- surface a user-visible recovery event when compaction blocks a later interactive turn
- optionally isolate compaction writes from normal interactive turn acquisition so a failed compaction cannot starve new user turns indefinitely
Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
OpenClaw
2026.5.18can finish an Anthropic/Opus Discord run, enter post-run auto-compaction, and then leave the session JSONL write lock held after the compaction path times out.After that, every new request in the same Discord channel session waits
60000msfor the session file lock and fails before the agent can reply:The only observed recovery was a Gateway restart, which removed the live lock state and allowed the channel to accept requests again.
This appears related to existing session-lock/event-loop/compaction reliability reports, but this reproduction is narrower: a successful Opus run is followed by auto-compaction that holds the same session JSONL lock long enough to make all subsequent channel turns fail with no useful in-channel recovery.
Steps to reproduce
60000ms.Observed reproduction:
#mwsagent:main:discord:channel:1506258704541159484/home/casper/.openclaw/agents/main/sessions/49e71c56-dcbc-40ab-be04-4a92fd2230be.jsonl/home/casper/.openclaw/agents/main/sessions/49e71c56-dcbc-40ab-be04-4a92fd2230be.jsonl.lock7d2170b5-3733-4439-8451-cad42efa577bThe final assistant answer for the original Opus run was written to the JSONL around
2026-05-19T13:33:13.804Z. The session lock was then created immediately after for the same gateway process:{ "pid": 963591, "createdAt": "2026-05-19T13:33:13.808Z", "starttime": 335126537 }Later user requests in the same channel failed at
13:45,13:46, and13:54 UTCwhile waiting for the same lock.Expected behavior
Post-run auto-compaction should not leave a live session write lock behind after timeout or abort.
Expected behavior:
Actual behavior
The original Opus run completed useful work and wrote its final assistant output to the session JSONL.
Immediately afterward, auto-compaction held the session write lock. The compaction path timed out, but the lock remained held by the live Gateway process. New Discord requests in the same channel then failed before an embedded agent could start or reply.
User-visible result:
OpenClaw version
OpenClaw 2026.5.18
Operating system
Ubuntu
Install method
npm global
Model
claude-opus-4-7
Provider / routing chain
anthropic/claude-opus-4-7 -> OpenClaw embedded run -> Discord channel session -> post-run auto-compaction
Additional provider/model setup details
Anthropic was used through the normal OpenClaw embedded runner path.
The incident happened after changing Discord group visible replies to automatic delivery to work around a separate
messagetool argument issue. The write-lock failure is independent of that delivery setting: the failing path is session persistence/auto-compaction before any later assistant reply can be generated.The same environment also has separate reports for:
item/completedSendMessagearguments being rejected instead of normalized tomessageThose are distinct symptoms. This report is specifically about the session JSONL lock left behind by post-run auto-compaction.
Logs, screenshots, and evidence
Impact and severity
Severity: High / work-blocking.
Impact:
Additional information
Immediate workaround:
Operational workaround until fixed:
*.jsonl.lockfiles in active session directoriesSuggested upstream fix areas:
finallyblocks around compaction