You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
EmbeddedAttemptSessionTakeoverError fires deterministically when auto-compaction at reason=threshold runs mid-turn on an active session that has paired lanes (main + session:<scope>). Both lanes die in the same millisecond the compaction completes, no user-visible reply is delivered, and the user sees only the generic "Something went wrong while processing your request" fallback.
This is the same error class as #86508 / #86966 / #86845 / #88369 / #89259 / #86572, but I'm filing separately because the trigger is different from any open ticket I could find:
Hoist withOwnedSessionTranscriptWrites ALS scope to span agent.prompt() to fix vanilla-openclaw same-lane fence trip #86572 (ALS scope) — proposes a fix for the streaming listener path. Auto-compaction takes a different code path (pi-embedded-CJ87lW5R.js:adoptCompactionTranscript → swaps activeSessionFile / activeSessionId), which shrinks or replaces the jsonl rather than appending. The fence's changeLooksLikeOwnedPromptOutput whitelist short-circuits at current.size < params.previous.size (selection-BmjEdnnA.js:7817), so the ALS-tagging fix would not catch this variant.
The point of a separate ticket is to give the compaction-specific variant its own deterministic repro so it doesn't get lost under the umbrella tickets.
Reproduction (deterministic on this deployment)
Single-user WhatsApp deployment, self-hosted Docker, 2026.5.20. The session in question (3a44d717-…) was 391 entries / 1.5 MB jsonl at the moment of failure. Auto-compaction fired at the threshold; 61 seconds later both lanes threw.
Drive a WhatsApp DM session past agents.defaults.compaction.softThresholdTokens so the next inbound message will trigger reason=threshold. (agents.defaults.compaction.memoryFlush.enabled = false in our config — this is not a memoryFlush trigger.)
Send a tool-heavy inbound message (in our trace: a request that drove the browser tool for several web searches).
Auto-compaction starts ~24s into the turn.
Compaction runs ~61s (calls the same provider as the turn).
The moment compaction completes, both lane=main and lane=session:agent:main:whatsapp:direct:<peer> throw EmbeddedAttemptSessionTakeoverError on the same session file in the same millisecond.
Evidence (verbatim from logs)
2026-06-05T15:04:31.585+03:00 [whatsapp] Inbound message <peer> -> <self> (direct, audio/ogg, 131 chars)
2026-06-05T15:04:46.936+03:00 [whatsapp] Sent message ... (282ms) ← prior turn reply OK
2026-06-05T15:05:09.717+03:00 [agent/embedded] embedded run auto-compaction start: runId=1ca5018a-… reason=threshold
2026-06-05T15:06:11.025+03:00 [agent/embedded] embedded run auto-compaction complete: runId=1ca5018a-… reason=threshold compactionCount=1 willRetry=false
2026-06-05T15:06:11.041+03:00 [diagnostic] lane task error: lane=main durationMs=98855 error="EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released: /home/node/.openclaw/agents/main/sessions/3a44d717-bec8-4fe6-9128-3a8771a6fab1.jsonl"
2026-06-05T15:06:11.043+03:00 [diagnostic] lane task error: lane=session:agent:main:whatsapp:direct:<peer> durationMs=98858 error="EmbeddedAttemptSessionTakeoverError: ... /home/node/.openclaw/agents/main/sessions/3a44d717-bec8-4fe6-9128-3a8771a6fab1.jsonl"
2026-06-05T15:06:11.059+03:00 Embedded agent failed before reply: ...
Note the timing precision: compaction complete at .025, lane=main throws at .041 (16 ms later), lane=session:… throws at .043 (2 ms after that). Both lanes had the same fence snapshot from before compaction started; compaction swapped/rewrote the file; both lanes' next withSessionWriteLock calls trip assertSessionFileFence.
This swaps activeSessionFile in-process but does not refresh the lock-guard's fence fingerprint — there is no call into selection-BmjEdnnA.js:refreshAfterOwnedSessionWrite() or any equivalent.
Compaction can (a) shrink the file (rewrite-in-place) or (b) change ino/dev (rewrite-then-rename or write to a different path). Both cases fail this check immediately → fence throws → EmbeddedAttemptSessionTakeoverError.
So the bug is: the lock-guard fence doesn't know that the run itself just rewrote the session file via compaction, and treats its own write as a foreign takeover.
Reproducible on every long-session WhatsApp DM once the threshold is crossed. Disabling memoryFlush (the other path into the same race, fixed for us by config) does not mitigate this because auto-compaction at reason=threshold has no off-switch.
The user loses a turn's worth of work (61s of compaction + the model output that the parent run was about to produce).
Suggested fixes (in order of cost)
Refresh the fence after adoptCompactionTranscript. Whoever swaps activeSessionFile should also call lockGuard.refreshAfterOwnedSessionWrite() (or expose a refreshForOwnedSessionReplace(path) variant that re-snapshots the fence against the new path). Probably the smallest patch.
Extend changeLooksLikeOwnedPromptOutput with an "owned replacement" case: if the run holds an owned-compaction marker, accept any post-compaction fingerprint (including new ino, smaller size).
Hold the write lock across compaction. Removes the lock-free window entirely, at the cost of blocking concurrent paired-lane reads while the LLM summarization runs (61s in our case — likely unacceptable).
I think (1) is the right shape — it's the symmetric counterpart of the refreshAfterOwnedSessionWrite() that already exists for append-style owned writes.
Environment
OpenClaw 2026.5.20 (self-hosted Docker)
Node v24.14.0
Host: Docker Desktop on macOS, virtiofs mounts
Provider: a custom local-host provider (Anthropic-compatible endpoint reachable from the container), Anthropic-family model, Anthropic-style fallback chain
session.reset = {mode: "idle", idleMinutes: 10080} (so sessions accumulate for up to a week)
Happy to grab additional logs / fs_usage / a lsof snapshot of the session file at the moment of compaction if it would help narrow the rewrite-vs-replace distinction further.
Summary
EmbeddedAttemptSessionTakeoverErrorfires deterministically when auto-compaction atreason=thresholdruns mid-turn on an active session that has paired lanes (main+session:<scope>). Both lanes die in the same millisecond the compaction completes, no user-visible reply is delivered, and the user sees only the generic "Something went wrong while processing your request" fallback.This is the same error class as #86508 / #86966 / #86845 / #88369 / #89259 / #86572, but I'm filing separately because the trigger is different from any open ticket I could find:
pi-embedded-CJ87lW5R.js:adoptCompactionTranscript→ swapsactiveSessionFile/activeSessionId), which shrinks or replaces the jsonl rather than appending. The fence'schangeLooksLikeOwnedPromptOutputwhitelist short-circuits atcurrent.size < params.previous.size(selection-BmjEdnnA.js:7817), so the ALS-tagging fix would not catch this variant.The point of a separate ticket is to give the compaction-specific variant its own deterministic repro so it doesn't get lost under the umbrella tickets.
Reproduction (deterministic on this deployment)
Single-user WhatsApp deployment, self-hosted Docker,
2026.5.20. The session in question (3a44d717-…) was 391 entries / 1.5 MB jsonl at the moment of failure. Auto-compaction fired at the threshold; 61 seconds later both lanes threw.agents.defaults.compaction.softThresholdTokensso the next inbound message will triggerreason=threshold. (agents.defaults.compaction.memoryFlush.enabled = falsein our config — this is not a memoryFlush trigger.)browsertool for several web searches).lane=mainandlane=session:agent:main:whatsapp:direct:<peer>throwEmbeddedAttemptSessionTakeoverErroron the same session file in the same millisecond.Evidence (verbatim from logs)
Note the timing precision: compaction
completeat.025,lane=mainthrows at.041(16 ms later),lane=session:…throws at.043(2 ms after that). Both lanes had the same fence snapshot from before compaction started; compaction swapped/rewrote the file; both lanes' nextwithSessionWriteLockcalls tripassertSessionFileFence.Root cause (read from the installed
dist/)Files referenced as they appear in
2026.5.20:pi-embedded-CJ87lW5R.js:2682+— context overflow detected mid-turn →compactContextEngineWithSafetyTimeout→ on success,adoptCompactionTranscript(compactResult).pi-embedded-CJ87lW5R.js:2214-2218:activeSessionFilein-process but does not refresh the lock-guard's fence fingerprint — there is no call intoselection-BmjEdnnA.js:refreshAfterOwnedSessionWrite()or any equivalent.selection-BmjEdnnA.js:7815-7825—changeLooksLikeOwnedPromptOutput:ino/dev(rewrite-then-rename or write to a different path). Both cases fail this check immediately → fence throws →EmbeddedAttemptSessionTakeoverError.So the bug is: the lock-guard fence doesn't know that the run itself just rewrote the session file via compaction, and treats its own write as a foreign takeover.
Why this matters
memoryFlush(the other path into the same race, fixed for us by config) does not mitigate this because auto-compaction atreason=thresholdhas no off-switch.Suggested fixes (in order of cost)
adoptCompactionTranscript. Whoever swapsactiveSessionFileshould also calllockGuard.refreshAfterOwnedSessionWrite()(or expose arefreshForOwnedSessionReplace(path)variant that re-snapshots the fence against the new path). Probably the smallest patch.changeLooksLikeOwnedPromptOutputwith an "owned replacement" case: if the run holds an owned-compaction marker, accept any post-compaction fingerprint (including newino, smallersize).I think (1) is the right shape — it's the symmetric counterpart of the
refreshAfterOwnedSessionWrite()that already exists for append-style owned writes.Environment
2026.5.20(self-hosted Docker)v24.14.0local-hostprovider (Anthropic-compatible endpoint reachable from the container), Anthropic-family model, Anthropic-style fallback chainagents.defaults.compaction.memoryFlush.enabled = falseagents.defaults.compaction.softThresholdTokens = 160000session.reset = {mode: "idle", idleMinutes: 10080}(so sessions accumulate for up to a week)Happy to grab additional logs /
fs_usage/ alsofsnapshot of the session file at the moment of compaction if it would help narrow the rewrite-vs-replace distinction further.