EmbeddedAttemptSessionTakeoverError: concurrent lane tasks race on session .jsonl file

## Bug Report: `EmbeddedAttemptSessionTakeoverError` causes "Something went wrong" in Feishu DM channel

### Environment
- **OpenClaw version:** 2026.5.20
- **Runtime:** Node.js 24.14.0
- **OS:** Linux 5.19.17 (NAS, Intel N97, 15GB RAM)
- **Channel:** Feishu (direct message)
- **Deployment:** Docker (`openclaw-gateway` + `openclaw-cli`)

### Description
When receiving messages via the Feishu DM channel, the agent occasionally crashes with `EmbeddedAttemptSessionTakeoverError`, which gets surfaced to the user as "Something went wrong while processing your request". This appears to be caused by concurrent lane tasks racing on the same session `.jsonl` file.

### Reproduction
1. Use Feishu DM channel (`dmPolicy: open`)
2. Send a message to the agent
3. Occasionally (not every time), the error occurs

The error seems more likely to happen after a `/new` command followed quickly by another message, but also occurs during normal usage.

### Observed Behavior
**Today (2026-05-23), the error occurred 3 times on the same session lane:**

| Time (UTC+8) | Session ID | Error | Durations |
|---|---|---|---|
| 06:54 | `4c74b327-...` | `EmbeddedAttemptSessionTakeoverError` | `lane=main` (14707ms) + `lane=session:...` (14709ms) |
| 10:32 | `0d1c5ef1-...` | Same | `lane=main` (74111ms) + `lane=session:...` (74116ms) |
| 11:37 | `fb875090-...` | Same | `lane=main` (14386ms) + `lane=session:...` (14389ms) |

**Key observations:**
- Both `lane=main` and `lane=session:agent:main:feishu:direct:{user_id}` fail simultaneously with nearly identical durations (within 3ms).
- The error is always: `session file changed while embedded prompt lock was released: /home/node/.openclaw/agents/main/sessions/{session_id}.jsonl`
- All failures originate from the **same Feishu DM lane** (`session=agent:main:feishu:direct:ou_4ee1d4e556e4bc4a2d1b3a084716a82d`).

### Root Cause Analysis (from source code inspection)

The error originates in `/app/dist/selection-BmjEdnnA.js`:

```javascript
async function assertSessionFileFence() {
    if (!fenceActive) return;
    const current = await readSessionFileFingerprint(params.lockOptions.sessionFile);
    if (!sameSessionFileFingerprint(fenceFingerprint, current)) {
        if (current.exists && await changeLooksLikeOwnedPromptOutput(...)) {
            fenceFingerprint = current; return;  // safe harbor for assistant output
        }
        takeoverDetected = true;
        throw new EmbeddedAttemptSessionTakeoverError(params.lockOptions.sessionFile);
    }
}
```

**The problem:** The `releaseForPrompt()` mechanism releases the session write lock while the LLM streams its response, but installs a "fence" to detect if the `.jsonl` file changes during that window. The `changeLooksLikeOwnedPromptOutput()` safe-harbor only allows **assistant transcript entries** to pass through without throwing. However, a **non-assistant write** (from another concurrent lane or task) triggers the error.

**Evidence of concurrent lanes:**
- Every incident shows **two lanes failing at the exact same millisecond** (duration diff < 5ms).
- This suggests the same dispatch spawns both `lane=main` and `lane=session:...`, and they race on the same session file.

### Ruled Out
- ❌ Docker permissions — fully verified (`docker ps`, `docker info`, `docker exec` all work)
- ❌ `auto-compaction` — compaction events occur at different timestamps (11:16, 11:51) than errors (11:37)
- ❌ `session-memory` hook — only writes to `memory/` directory, not `.jsonl`
- ❌ Cron jobs — `enabled: true`, `jobs: 0`, no jobs running during failures
- ❌ PT MCP server — does not interact with session files

### Relevant Log Snippet (11:37 incident)
```
11:36:41 Feishu DM: /new
11:36:41 dispatching to agent (session=agent:main:feishu:direct:...)
11:36:42 dispatch complete (queuedFinal=true, replies=1)

11:37:33 Feishu DM: "查看下你的docker权限都完整不"
11:37:33 dispatching to agent (session=agent:main:feishu:direct:...)
11:37:34 tool "_debug" from server "pt-mcp-server" registered...
11:37:47 lane task error: lane=main durationMs=14386 error="EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released: ...fb875090-...jsonl"
11:37:47 lane task error: lane=session:... durationMs=14389 (same error)
```

### Impact
- User experience: intermittent "Something went wrong" errors
- Frequency: ~3 times per day under normal usage
- Session data is not corrupted, but the turn fails completely

### Workaround
- Avoid sending messages immediately after `/new`; wait 3-5 seconds for session initialization to complete.
- Use `/reset` or `/new` periodically to prevent long-running sessions from accumulating race conditions.

### Suggested Fix
The session write lock + fence mechanism may need to:
1. Ensure only **one** lane task can hold the embedded prompt lock for a given session at a time, OR
2. Extend the `changeLooksLikeOwnedPromptOutput()` safe-harbor to account for concurrent lane tasks writing to the same file, OR
3. Serialize the dispatch so that `lane=main` and `lane=session:...` do not run concurrently on the same session.

---

**Labels:** `bug`, `concurrency`, `session`, `feishu`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

EmbeddedAttemptSessionTakeoverError: concurrent lane tasks race on session .jsonl file #85633

Bug Report: `EmbeddedAttemptSessionTakeoverError` causes "Something went wrong" in Feishu DM channel

Environment

Description

Reproduction

Observed Behavior

Root Cause Analysis (from source code inspection)

Ruled Out

Relevant Log Snippet (11:37 incident)

Impact

Workaround

Suggested Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Time (UTC+8)	Session ID	Error	Durations
06:54	`4c74b327-...`	`EmbeddedAttemptSessionTakeoverError`	`lane=main` (14707ms) + `lane=session:...` (14709ms)
10:32	`0d1c5ef1-...`	Same	`lane=main` (74111ms) + `lane=session:...` (74116ms)
11:37	`fb875090-...`	Same	`lane=main` (14386ms) + `lane=session:...` (14389ms)

Uh oh!

EmbeddedAttemptSessionTakeoverError: concurrent lane tasks race on session .jsonl file #85633

Description

Bug Report: EmbeddedAttemptSessionTakeoverError causes "Something went wrong" in Feishu DM channel

Environment

Description

Reproduction

Observed Behavior

Root Cause Analysis (from source code inspection)

Ruled Out

Relevant Log Snippet (11:37 incident)

Impact

Workaround

Suggested Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Bug Report: `EmbeddedAttemptSessionTakeoverError` causes "Something went wrong" in Feishu DM channel