EmbeddedAttemptSessionTakeoverError: self-inflicted session file modification during lock-free window (race condition)

## Bug Description

Cron jobs using `idealab/claude-opus-4-6` as the model consistently fail with `EmbeddedAttemptSessionTakeoverError` when the session file fingerprint (dev/ino/size/mtimeNs/ctimeNs) changes between `releaseForPrompt()` and the subsequent `assertSessionFileFence()` check.

The modification is **self-inflicted** — the gateway's own internal async process (likely memory-core plugin indexing, model-snapshot write, or trajectory sync) modifies the `.jsonl` session file during the lock-free window while waiting for the model response.

## Reproduction

- **Version**: 2026.5.20
- **Trigger**: Any isolated cron job with `idealab/claude-opus-4-6` that has a ~20s+ model response time
- **Frequency**: 100% reproducible once timing corridor is hit (7/7 consecutive failures for the same job)
- **Workaround**: Switching to a different provider (e.g. `dashscope/deepseek-v4-pro` or even `idealab/gpt-5.4`) avoids the issue — suggesting the race is provider-specific in the streaming/auth initialization path

## Evidence

1. The error appears in logs since 5/19, hitting multiple job types intermittently:
   - `memory-capture-fallback` (ops agent)
   - `daily-ops-review` (ops agent)
   - `dreaming-narrative` (main agent)
   - Dashboard sessions (main agent)

2. Same agent + same model + different prompt size = different outcome:
   - Short prompt job (capture-fallback, ~4s model response) → succeeds
   - Long prompt job (daily-ops-review, ~24s model response) → fails every time

3. Gateway restart does NOT fix it (confirmed: restarted, immediately failed again)

4. Switching model to non-Opus provider → session lock error disappears (fails on tool error instead, but the lock race is gone)

## Root Cause Analysis

Source: `dist/selection-BmjEdnnA.js` lines 7945-8050

```javascript
// releaseForPrompt() records fingerprint then releases lock
async releaseForPrompt() {
    fenceFingerprint = await readSessionFileFingerprint(sessionFile);
    fenceActive = true;
    await lock.release();
}

// assertSessionFileFence() checks fingerprint hasn't changed
async function assertSessionFileFence() {
    const current = await readSessionFileFingerprint(sessionFile);
    if (!sameSessionFileFingerprint(fenceFingerprint, current)) {
        // Only exception: growth is pure assistant transcript entries
        if (await changeLooksLikeOwnedPromptOutput({...})) {
            fenceFingerprint = current; return;
        }
        throw new EmbeddedAttemptSessionTakeoverError(sessionFile);
    }
}

// Fingerprint uses nanosecond-precision mtime
function sameSessionFileFingerprint(left, right) {
    return left.dev === right.dev && left.ino === right.ino
        && left.size === right.size
        && left.mtimeNs === right.mtimeNs
        && left.ctimeNs === right.ctimeNs;
}
```

The fingerprint comparison is correct in principle but the **invariant assumption** ("no internal process will write to the session file during the lock-free window") is violated by the gateway's own async pipeline.

## Suggested Fixes

1. **Drain all pending session writes before recording fingerprint** — ensure no async internal write is in-flight when `releaseForPrompt()` snapshots the fingerprint
2. **Relax fingerprint to size-only** — if the file grew by known-good internal entries (not just assistant output), allow it
3. **Add grace period** — re-read fingerprint after a short delay if mismatch detected, to handle writes that were "in the pipeline" at snapshot time
4. **Provider-aware lock timing** — if certain providers trigger additional session writes during auth/streaming setup, account for them in the lock lifecycle

## Environment

- macOS 14.6 (arm64)
- Node v22.19.0
- OpenClaw 2026.5.20
- Plugins: browser, memory-core, searxng, skill-trigger-engine
- 4 configured agents (main, ops, scout, editor)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

EmbeddedAttemptSessionTakeoverError: self-inflicted session file modification during lock-free window (race condition) #86804

Bug Description

Reproduction

Evidence

Root Cause Analysis

Suggested Fixes

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

EmbeddedAttemptSessionTakeoverError: self-inflicted session file modification during lock-free window (race condition) #86804

Description

Bug Description

Reproduction

Evidence

Root Cause Analysis

Suggested Fixes

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions