fix(embedded-agent-runner): false positive EmbeddedAttemptSessionTakeoverError during session compaction#88348
Conversation
…ce check sessionFenceRewriteIsBenign rejected all rewrites where the file inode changed, even when content validation passed. When compaction rewrites the session file (new inode, smaller size), the early sameSessionFileIdentity check caused both benign-check functions to return false, triggering a false-positive EmbeddedAttemptSessionTakeoverError. Fix: remove sameSessionFileIdentity from the early-return guard in sessionFenceRewriteIsBenign. The function already rigorously validates content via lineMatchesLinearTranscriptMigration. When content validation passes, the rewrite is safe regardless of whether the inode changed — a compaction rewrite with new inode is benign.
|
Codex review: needs real behavior proof before merge. Reviewed May 30, 2026, 10:16 AM ET / 14:16 UTC. Summary PR surface: Source +9. Total +9 across 1 file. Reproducibility: unclear. source inspection shows current main rejects different-inode rewrites before content validation, but I did not establish that OpenClaw's actual compaction path rewrites the same active session file with a new inode. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Keep the takeover fence strict by tying any accepted replacement to an owned or trusted compaction write, or to a narrowly validated compaction-specific path, with focused session-lock coverage and real run proof. Do we have a high-confidence way to reproduce the issue? Unclear: source inspection shows current main rejects different-inode rewrites before content validation, but I did not establish that OpenClaw's actual compaction path rewrites the same active session file with a new inode. Is this the best way to solve the issue? No: accepting all content-valid replacement files is broader than the compaction problem; the safer solution is an owned/trusted compaction path plus regression coverage. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: reasoning high; reviewed against d5be702f86e7. Label changesLabel justifications:
Evidence reviewedPR surface: Source +9. Total +9 across 1 file. View PR surface stats
Security concerns:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
I don't think the new-inode acceptance is safe in this shape. Dropping |
Bug Summary
Error:
EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was releasedSymptom: OpenClaw bot stops processing messages, repeating
lane task error+Embedded agent failed before replyRoot cause location:
src/agents/embedded-agent-runner/run/attempt.session-lock.ts,sessionFenceRewriteIsBenign()Root Cause
When a prompt is being streamed,
releaseForPrompt()releases the write lock on the session file and setsfenceActive = true. While the lock is released (during LLM streaming), OpenClaw's compaction process may rewrite the session file to compact old messages.When
reacquireAfterPrompt()re-acquires the lock and callsassertSessionFileFence(), the fence check fails because:sessionFenceRewriteIsBenignhassameSessionFileIdentity()(inode comparison) as its first guard condition (line 266)sameSessionFileIdentityreturnsfalse→ function returnsfalseimmediately without reaching content validationsessionFenceAdvanceIsBenignandsessionFenceRewriteIsBenignreturnfalseEmbeddedAttemptSessionTakeoverErroris thrown (false positive)The Fix
Remove
sameSessionFileIdentity()from the early-return guard insessionFenceRewriteIsBenign. The function already rigorously validates content vialineMatchesLinearTranscriptMigration+isTranscriptOnlyOpenClawAssistantLine. When content validation passes, the rewrite is safe regardless of whether the inode changed — a compaction rewrite with new inode is benign.- !sameSessionFileIdentity(params.previous.fingerprint, params.current) || params.current.size > BigInt(MAX_BENIGN_SESSION_FENCE_REWRITE_RESULT_BYTES) ||And the terminal
returnstatement becomes a content-check +true:Why This Is Safe
The
lineMatchesLinearTranscriptMigration+isTranscriptOnlyOpenClawAssistantLinecontent validation rigorously confirms:If a malicious process tried to fake a "compaction", it would need to produce content that passes this validation AND still trigger the takeover — which is impossible by design.
Reproduction Scenario
releaseForPrompt()releases write lock (to allow streaming)~/.openclaw/agents/ling/sessions/{session-id}.jsonlreacquireAfterPrompt()detects file changed → throwsEmbeddedAttemptSessionTakeoverErrorlane task error+Embedded agent failed before replyTesting
sessionFenceRewriteIsBenignrewrite-with-new-inode scenarioEmbeddedAttemptSessionTakeoverErrorFix branch:
raymondjxj/fix/session-fence-compaction-false-positiveFile changed:
src/agents/embedded-agent-runner/run/attempt.session-lock.ts