-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
EmbeddedAttemptSessionTakeoverError fires at ~120s on long Bedrock streams (fence whitelist too narrow?) #89259
Copy link
Copy link
Open
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Summary
EmbeddedAttemptSessionTakeoverError("session file changed while embedded prompt lock was released") fires consistently around the 120s mark on long Bedrock streaming runs, killing the turn even though no other agent or runner is touching the session. The fence's benign-rewrite whitelist appears too narrow for legitimate concurrent writes that happen during normal streaming + delivery flows.Environment
2026.5.22(npm install, host Linux 6.17 x64, node v24.14.0)@openclaw/amazon-bedrock-provider2026.5.22amazon-bedrock/zai.glm-5viabedrock-converse-streamAPI/hooks/github-engineering, cron-nested isolated agentTurnsmain,cron-nested,session:agent:main:slack:direct:<user>:thread:<ts>Reliable repro pattern
exec+ghtool calls.provider:"amazon-bedrock" model:"zai.glm-5"assistant entry and throwsEmbeddedAttemptSessionTakeoverError.Failure timestamps observed (same session, two distinct user messages):
21:44:02.xxx, empty assistant + throw at21:46:02.573(durationMs ~129017).22:02:48.xxx, empty assistant + throw at22:04:48.944(durationMs ~151455).Same error type also fires from cron-nested lanes and from the github-engineering hook agentTurns running on completely different sessionFiles.
What the code path looks like
dist/pi-embedded-CsSFzly6.js:159—enqueueCommandInLane(sessionLane, () => enqueueGlobal(...))— the two simultaneous lane errors per failure are the same single failure unwinding nested lanes (sessionLane outer, globalLane inner), not two writers.dist/selection-hR-AeOeU.js:7998—TRANSCRIPT_ONLY_OPENCLAW_ASSISTANT_MODELS = new Set(["delivery-mirror","gateway-injected"]). Anything else flipstakeoverDetected = true.dist/selection-hR-AeOeU.js:8086/8097—sessionFenceAdvanceIsBenign/sessionFenceRewriteIsBenignonly allow lines whose model is in that whitelist.dist/selection-hR-AeOeU.js:8210— classEmbeddedAttemptSessionTakeoverError; thrown at:8324/:8387/:8440/:8530.The failure stub written at the takeover moment has
provider:"amazon-bedrock", model:"zai.glm-5"— i.e. the very record the runtime writes itself when the prompt lock is released — but on reacquire that line is treated as foreign.Hypothesis
One of these (in order of likelihood) is writing during the prompt-lock-released window and tripping the fence:
dist/chat-zFy9Y_4Y.js:1351 fs.writeFileSync(params.transcriptPath, ...)).dist/deliver-WPtVqUMT.js:1287,dist/run-delivery.runtime-B3LSluU0.js:366,dist/message-action-runner-B4oH5EYj.js:908) — but those usemodel:"delivery-mirror"which IS whitelisted, so probably not these.Suggested fixes
runIdmatches the still-active attempt).runId, not "any line that doesn't match the whitelist".agents.<id>.session.fenceMode = "warn" | "strict").Mitigation we applied locally
Not a fix, just headroom:
These help when the lock is the contention point. They don't address the fence whitelist itself.
Logs / artefacts available on request