Bug Report: EmbeddedAttemptSessionTakeoverError causes "Something went wrong" in Feishu DM channel
Environment
- OpenClaw version: 2026.5.20
- Runtime: Node.js 24.14.0
- OS: Linux 5.19.17 (NAS, Intel N97, 15GB RAM)
- Channel: Feishu (direct message)
- Deployment: Docker (
openclaw-gateway + openclaw-cli)
Description
When receiving messages via the Feishu DM channel, the agent occasionally crashes with EmbeddedAttemptSessionTakeoverError, which gets surfaced to the user as "Something went wrong while processing your request". This appears to be caused by concurrent lane tasks racing on the same session .jsonl file.
Reproduction
- Use Feishu DM channel (
dmPolicy: open)
- Send a message to the agent
- Occasionally (not every time), the error occurs
The error seems more likely to happen after a /new command followed quickly by another message, but also occurs during normal usage.
Observed Behavior
Today (2026-05-23), the error occurred 3 times on the same session lane:
| Time (UTC+8) |
Session ID |
Error |
Durations |
| 06:54 |
4c74b327-... |
EmbeddedAttemptSessionTakeoverError |
lane=main (14707ms) + lane=session:... (14709ms) |
| 10:32 |
0d1c5ef1-... |
Same |
lane=main (74111ms) + lane=session:... (74116ms) |
| 11:37 |
fb875090-... |
Same |
lane=main (14386ms) + lane=session:... (14389ms) |
Key observations:
- Both
lane=main and lane=session:agent:main:feishu:direct:{user_id} fail simultaneously with nearly identical durations (within 3ms).
- The error is always:
session file changed while embedded prompt lock was released: /home/node/.openclaw/agents/main/sessions/{session_id}.jsonl
- All failures originate from the same Feishu DM lane (
session=agent:main:feishu:direct:ou_4ee1d4e556e4bc4a2d1b3a084716a82d).
Root Cause Analysis (from source code inspection)
The error originates in /app/dist/selection-BmjEdnnA.js:
async function assertSessionFileFence() {
if (!fenceActive) return;
const current = await readSessionFileFingerprint(params.lockOptions.sessionFile);
if (!sameSessionFileFingerprint(fenceFingerprint, current)) {
if (current.exists && await changeLooksLikeOwnedPromptOutput(...)) {
fenceFingerprint = current; return; // safe harbor for assistant output
}
takeoverDetected = true;
throw new EmbeddedAttemptSessionTakeoverError(params.lockOptions.sessionFile);
}
}
The problem: The releaseForPrompt() mechanism releases the session write lock while the LLM streams its response, but installs a "fence" to detect if the .jsonl file changes during that window. The changeLooksLikeOwnedPromptOutput() safe-harbor only allows assistant transcript entries to pass through without throwing. However, a non-assistant write (from another concurrent lane or task) triggers the error.
Evidence of concurrent lanes:
- Every incident shows two lanes failing at the exact same millisecond (duration diff < 5ms).
- This suggests the same dispatch spawns both
lane=main and lane=session:..., and they race on the same session file.
Ruled Out
- ❌ Docker permissions — fully verified (
docker ps, docker info, docker exec all work)
- ❌
auto-compaction — compaction events occur at different timestamps (11:16, 11:51) than errors (11:37)
- ❌
session-memory hook — only writes to memory/ directory, not .jsonl
- ❌ Cron jobs —
enabled: true, jobs: 0, no jobs running during failures
- ❌ PT MCP server — does not interact with session files
Relevant Log Snippet (11:37 incident)
11:36:41 Feishu DM: /new
11:36:41 dispatching to agent (session=agent:main:feishu:direct:...)
11:36:42 dispatch complete (queuedFinal=true, replies=1)
11:37:33 Feishu DM: "查看下你的docker权限都完整不"
11:37:33 dispatching to agent (session=agent:main:feishu:direct:...)
11:37:34 tool "_debug" from server "pt-mcp-server" registered...
11:37:47 lane task error: lane=main durationMs=14386 error="EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released: ...fb875090-...jsonl"
11:37:47 lane task error: lane=session:... durationMs=14389 (same error)
Impact
- User experience: intermittent "Something went wrong" errors
- Frequency: ~3 times per day under normal usage
- Session data is not corrupted, but the turn fails completely
Workaround
- Avoid sending messages immediately after
/new; wait 3-5 seconds for session initialization to complete.
- Use
/reset or /new periodically to prevent long-running sessions from accumulating race conditions.
Suggested Fix
The session write lock + fence mechanism may need to:
- Ensure only one lane task can hold the embedded prompt lock for a given session at a time, OR
- Extend the
changeLooksLikeOwnedPromptOutput() safe-harbor to account for concurrent lane tasks writing to the same file, OR
- Serialize the dispatch so that
lane=main and lane=session:... do not run concurrently on the same session.
Labels: bug, concurrency, session, feishu
Bug Report:
EmbeddedAttemptSessionTakeoverErrorcauses "Something went wrong" in Feishu DM channelEnvironment
openclaw-gateway+openclaw-cli)Description
When receiving messages via the Feishu DM channel, the agent occasionally crashes with
EmbeddedAttemptSessionTakeoverError, which gets surfaced to the user as "Something went wrong while processing your request". This appears to be caused by concurrent lane tasks racing on the same session.jsonlfile.Reproduction
dmPolicy: open)The error seems more likely to happen after a
/newcommand followed quickly by another message, but also occurs during normal usage.Observed Behavior
Today (2026-05-23), the error occurred 3 times on the same session lane:
4c74b327-...EmbeddedAttemptSessionTakeoverErrorlane=main(14707ms) +lane=session:...(14709ms)0d1c5ef1-...lane=main(74111ms) +lane=session:...(74116ms)fb875090-...lane=main(14386ms) +lane=session:...(14389ms)Key observations:
lane=mainandlane=session:agent:main:feishu:direct:{user_id}fail simultaneously with nearly identical durations (within 3ms).session file changed while embedded prompt lock was released: /home/node/.openclaw/agents/main/sessions/{session_id}.jsonlsession=agent:main:feishu:direct:ou_4ee1d4e556e4bc4a2d1b3a084716a82d).Root Cause Analysis (from source code inspection)
The error originates in
/app/dist/selection-BmjEdnnA.js:The problem: The
releaseForPrompt()mechanism releases the session write lock while the LLM streams its response, but installs a "fence" to detect if the.jsonlfile changes during that window. ThechangeLooksLikeOwnedPromptOutput()safe-harbor only allows assistant transcript entries to pass through without throwing. However, a non-assistant write (from another concurrent lane or task) triggers the error.Evidence of concurrent lanes:
lane=mainandlane=session:..., and they race on the same session file.Ruled Out
docker ps,docker info,docker execall work)auto-compaction— compaction events occur at different timestamps (11:16, 11:51) than errors (11:37)session-memoryhook — only writes tomemory/directory, not.jsonlenabled: true,jobs: 0, no jobs running during failuresRelevant Log Snippet (11:37 incident)
Impact
Workaround
/new; wait 3-5 seconds for session initialization to complete./resetor/newperiodically to prevent long-running sessions from accumulating race conditions.Suggested Fix
The session write lock + fence mechanism may need to:
changeLooksLikeOwnedPromptOutput()safe-harbor to account for concurrent lane tasks writing to the same file, ORlane=mainandlane=session:...do not run concurrently on the same session.Labels:
bug,concurrency,session,feishu