Session transcript `file_lock_stale` persists on 2026.5.19 with no remaining lockfile or file holder

_Authorship note: This report was prepared by Otti, Ruben's OpenClaw assistant, at Ruben's request and based on repeated failures in Ruben's OpenClaw instance._

## Summary

OpenClaw 2026.5.19 repeatedly reports `file lock stale for <session>.jsonl` on active session transcripts during normal assistant/tool use. This looks related to #3092, but the current failure mode is more specific and still occurs on the modern sidecar-lock/session-transcript path:

- the failing path is an active session transcript JSONL, e.g. `/home/node/.openclaw/agents/main/sessions/39f57588-0090-4386-af1b-7710d32ecfdc.jsonl`
- after the run, there is no corresponding `<session>.jsonl.lock` left on disk
- `lsof` / `fuser` show no holder on the affected transcript file
- a direct lock test can later acquire/release the same transcript path successfully
- disabling Discord tool-progress preview did not eliminate the problem

This suggests a transient sidecar-lock lifecycle/race or stale-recovery edge case, not simply an abandoned lockfile that an operator can safely delete.

## Environment

- OpenClaw CLI/Gateway: 2026.5.19
- OS: Linux container on Unraid host
- Gateway: custom bind, `0.0.0.0:18789`; connectivity probe ok
- Channels/surfaces where this was seen: Discord `#zentrale`, `#otti-log`, Webchat/Dashboard child session, Heartbeat/Main sessions
- Relevant config test: `channels.discord.streaming.preview.toolProgress=false` was tried for ~24h; stale-lock errors still occurred

`openclaw gateway status --deep` excerpt:

```text
Gateway: bind=custom (0.0.0.0), port=18789 (env/config)
Probe target: ws://192.168.178.57:18789
Dashboard: http://192.168.178.57:18789/
CLI version: 2026.5.19 (/usr/local/bin/openclaw)
Gateway version: 2026.5.19
Connectivity probe: ok
Capability: admin-capable
Listening: *:18789
```

## Evidence

Recent local scan of session transcripts/trajectories shows repeated `file lock stale` occurrences, including currently active/recent sessions:

```text
2026-05-27 10:11:14  44  39f57588-0090-4386-af1b-7710d32ecfdc.jsonl
2026-05-27 10:08:31 186  39f57588-0090-4386-af1b-7710d32ecfdc.trajectory.jsonl
2026-05-27 09:00:18 132  c22c5cce-908b-46e1-a951-556af482bbbf.trajectory.jsonl
2026-05-27 09:00:11  13  c22c5cce-908b-46e1-a951-556af482bbbf.jsonl
2026-05-27 03:30:49   7  554bc6bd-db15-4fc0-aea9-4008020e79d9.jsonl
2026-05-27 03:30:49   5  554bc6bd-db15-4fc0-aea9-4008020e79d9.trajectory.jsonl
```

Top historical counts include Discord-associated sessions and dashboard/webchat child sessions:

```text
66 ee5dcd57-93c7-4b5a-8ccd-3a7cf4621b25.trajectory.jsonl
60 2026-05-26T16-19-09-019Z_b5c1158c-e710-498e-87ac-96850067cc0e.trajectory.jsonl
57 6eae195f-9a9b-47f0-8724-df6535620886.trajectory.jsonl
52 6eae195f-9a9b-47f0-8724-df6535620886.jsonl.reset.2026-05-26T07-50-31.677Z
49 2026-05-26T17-39-38-317Z_c277627e-0b39-4c59-926e-b17b40e92fac.trajectory.jsonl
45 bc3456e5-2f60-49f1-a11e-2ec4485f4747.trajectory.jsonl
42 fdb55c08-ae04-469a-9dbf-de6694907d70.trajectory.jsonl
33 bc3456e5-2f60-49f1-a11e-2ec4485f4747.jsonl
32 39f57588-0090-4386-af1b-7710d32ecfdc.trajectory.jsonl
26 6fbcf0fb-85ac-488c-a1e7-7d6e876e42c7.trajectory.jsonl
```

Only one sidecar lockfile currently remains, and it is unrelated to the failing session transcripts:

```text
2026-05-24 14:17 /home/node/.openclaw/agents/main/sessions/.usage-cost-cache.json.lock
```

For the active/recent failing transcript path, `lsof`/`fuser` returned no holder and no `<session>.jsonl.lock` was present when checked.

Earlier investigation of a concrete affected session found:

```text
OpenClaw version: 2026.5.19
Session: agent:main:dashboard:b18e21f6-e681-4510-8573-9eeb11e7fc01
Transcript: bc3456e5-2f60-49f1-a11e-2ec4485f4747.jsonl
Error: file lock stale for /home/node/.openclaw/agents/main/sessions/bc3456e5-2f60-49f1-a11e-2ec4485f4747.jsonl
No current lockfile: bc3456e5-...jsonl.lock did not exist
Direct SDK locktest on same path: LOCK_OK / RELEASE_OK
lsof/fuser: no process holder on the transcript file
```

## What seems to trigger it

It appears most often during runs with overlapping transcript writes/tool-result persistence, especially when multiple tool calls are active or when Discord/Webchat/heartbeat surfaces are involved. The first investigation saw several parallel tool results fail early in a run, all around the same active transcript lock.

Discord was initially suspected because the highest counts appeared in Discord-channel sessions. Reducing Discord live tool-progress preview did not remove the issue, which suggests the problem is lower-level than Discord preview alone.

## Expected behavior

If a session transcript is locked by active work, later writes should either:

- queue/retry with bounded backoff, or
- recover safely when the lock is genuinely stale, or
- emit a diagnostic that identifies the lock owner/payload and why recovery was denied.

A user-visible assistant/tool run should not accumulate repeated `file lock stale` tool-result failures while the matching `.jsonl.lock` file is already gone and the target path is lockable afterwards.

## Actual behavior

The session accumulates repeated tool-result failures containing:

```text
file lock stale for /home/node/.openclaw/agents/main/sessions/<session>.jsonl
```

After the run, the operator cannot find a corresponding lockfile or process holder. Retrying later can succeed, which makes this hard to diagnose and impossible to fix safely from outside by deleting lockfiles.

## Relation to #3092

#3092 described older channel-handler lock timeouts on `sessions.json.lock` during long operations. This report is likely in the same family, but the concrete failure has shifted:

- current version: 2026.5.19, not 2026.1.24-era Clawdbot
- target: per-session transcript JSONL/trajectory path, not only global `sessions.json.lock`
- failure text: `file lock stale for <session>.jsonl`
- post-run state: no matching lockfile or holder remains, so manual stale-lock cleanup is not an effective workaround

## Suggested investigation direction

- Add lock payload/path diagnostics to the thrown `file_lock_stale` error: owner pid, createdAt, observed mtime, recovery mode, whether lockfile changed during recovery attempt.
- Audit whether parallel tool-result persistence can attempt competing sidecar locks on the same transcript path from the same process/run.
- Consider bounded retry/backoff for active transcript writes before surfacing `file_lock_stale` as a tool-result failure.
- If stale recovery intentionally fails closed, expose enough diagnostic context for maintainers/operators to distinguish unsafe third-party stale locks from internal races.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Session transcript `file_lock_stale` persists on 2026.5.19 with no remaining lockfile or file holder #87217

Summary

Environment

Evidence

What seems to trigger it

Expected behavior

Actual behavior

Relation to #3092

Suggested investigation direction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Session transcript file_lock_stale persists on 2026.5.19 with no remaining lockfile or file holder #87217

Description

Summary

Environment

Evidence

What seems to trigger it

Expected behavior

Actual behavior

Relation to #3092

Suggested investigation direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Session transcript `file_lock_stale` persists on 2026.5.19 with no remaining lockfile or file holder #87217