Skip to content

fix(session-lock): allow co-tenant writes during prompt I/O window#84149

Closed
Jerry-Xin wants to merge 2 commits into
openclaw:mainfrom
Jerry-Xin:fix/session-takeover-co-tenant-84071
Closed

fix(session-lock): allow co-tenant writes during prompt I/O window#84149
Jerry-Xin wants to merge 2 commits into
openclaw:mainfrom
Jerry-Xin:fix/session-takeover-co-tenant-84071

Conversation

@Jerry-Xin

Copy link
Copy Markdown
Contributor

Summary

EmbeddedAttemptSessionTakeoverError fires on legitimate co-tenant writes (heartbeat, cron, channel ingress) to shared sessions during the prompt I/O window.

Root Cause

attempt.session-lock.ts:284-291: releaseForPrompt() records a file stat fingerprint and releases the lock. Co-tenant writers acquire the same fs-level lock and append to the session file (changing the stat). When the controller reacquires, assertSessionFileFence() sees a fingerprint mismatch and sets the sticky takeoverDetected flag — even though the write was coordinated through the same lock.

Fix

Add a process-scoped write epoch counter to acquireSessionWriteLock() that increments on each lock acquisition. The controller records the epoch at fence activation. On reacquisition, it checks the epoch delta:

  • delta == 1 (only controller acquired): fingerprint change = external takeover → throw
  • delta > 1 (co-tenants also acquired): fingerprint change = coordinated write → update fence, continue

Cross-process writes don't advance the in-process epoch, so true external takeovers are still detected.

Tests

4 new tests: co-tenant write accepted, external write detected, mixed co-tenant + external detected, multiple co-tenant writes succeed. All 26 existing session-lock tests pass.

Fixes #84071

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. size: S labels May 19, 2026
@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Thanks for the context here. I swept through the related work, and this is now duplicate or superseded.

Close as superseded: a newer open PR now tracks the same in-process/co-tenant session-write problem with a stronger ownership-aware fence and real filesystem proof, while this branch still has a blocking session-integrity weakness and no real behavior proof.

Canonical path: Close this branch and review or land #84250, or an equivalent ownership-aware fence that accepts trusted same-process locked writes while preserving external-edit negative tests.

So I’m closing this here and keeping the remaining discussion on #84250.

Review details

Best possible solution:

Close this branch and review or land #84250, or an equivalent ownership-aware fence that accepts trusted same-process locked writes while preserving external-edit negative tests.

Do we have a high-confidence way to reproduce the issue?

Yes at source level: current main snapshots the session-file fingerprint at prompt release and throws on any changed fingerprint after reacquire, while legitimate event/hook writers can use the same session-lock path. I did not run a live heartbeat, cron, or channel reproduction in this read-only review.

Is this the best way to solve the issue?

No. The epoch shortcut fixes the false positive by weakening the fence for the whole prompt window; the safer direction is the superseding ownership-aware fingerprint approach in #84250 or equivalent.

Security review:

Security review needs attention: The patch changes the session-integrity fence and can weaken detection of lock-bypassing session-file mutation in a mixed prompt window.

  • [medium] Fence accepts mixed-window bypass writes — src/agents/pi-embedded-runner/run/attempt.session-lock.ts:295
    The new epoch branch rebaselines the session fingerprint whenever another in-process lock acquisition happened, even if the file change also includes an external mutation that bypassed the lock before reacquire.
    Confidence: 0.88

What I checked:

Likely related people:

  • amknight: Merged history for Release embedded session write lock before model I/O #82891 introduced the released-prompt session write-lock lifecycle and fence files now involved in this regression cluster. (role: introduced behavior; confidence: high; commits: 8a060b2904d4, 9a9c557a0252, 96094fb844d0; files: src/agents/pi-embedded-runner/run/attempt.session-lock.ts, src/agents/pi-embedded-runner/run/attempt.ts, src/agents/pi-embedded-runner/run/attempt.session-lock.test.ts)
  • tianxiaochannel-oss88: Authored the newer open ownership-aware fence PR that covers the same in-process write class and the external-then-locked negative case. (role: adjacent proposed-fix owner; confidence: medium; commits: 7008438d978c; files: src/agents/pi-embedded-runner/run/attempt.session-lock.ts, src/agents/pi-embedded-runner/run/attempt.session-lock.test.ts)
  • kays0x: Authored the related open stricter append-only/content-validation fence PR that changes the same session-lock surface with supplied live proof for the false-takeover cluster. (role: adjacent proposed-fix owner; confidence: medium; commits: ac07599ff6da; files: src/agents/pi-embedded-runner/run/attempt.session-lock.ts, src/agents/pi-embedded-runner/run/attempt.session-lock.test.ts, src/agents/pi-embedded-runner/transcript-file-state.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against a002c416c7af.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. labels May 19, 2026
@Jerry-Xin Jerry-Xin force-pushed the fix/session-takeover-co-tenant-84071 branch from 238dcef to cac1b69 Compare May 19, 2026 14:07
@Jerry-Xin

Copy link
Copy Markdown
Contributor Author

Rebased on latest upstream/main (b7ba7c3, clean rebase, no conflicts).

@Jerry-Xin

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review. Regarding the P1 finding about epoch-based fence weakness:

Design rationale for the co-tenant epoch shortcut:

The epoch counter (SESSION_WRITE_EPOCHS) is a process-local in-memory Map that increments on each successful lock acquisition. When currentEpoch > fenceWriteEpoch + 1, it means:

  • Our own reacquire accounts for exactly 1 increment
  • The remaining increment(s) prove at least one other in-process lock holder (heartbeat, cron ingress, channel worker) acquired the session lock during the prompt I/O window

The concern is that a co-tenant's lock acquisition could mask an external (lock-bypassing) session file append in the same window. This is technically correct, but:

  1. External bypasses are already a fundamental lock violation — if another process can write to the session file without acquiring the lock, the lock-based protection is broken at a deeper level than this fence can address.
  2. The alternative (rejecting all changes when co-tenants are active) recreates the original bug — legitimate heartbeat/cron/channel writes during prompt I/O would keep tripping the takeover guard, stalling the session indefinitely.
  3. The attack surface is narrow — the window is the prompt I/O duration, the external writer must bypass the lock, and the co-tenant must also be active in that same window.

I'll add a code comment documenting this deliberate trade-off. Happy to discuss alternative approaches if maintainers prefer a tighter check — for example, tracking a fingerprint chain per epoch, though that would add significant complexity.

@clawsweeper clawsweeper Bot added status: 🛠️ actively grinding The PR author has acted after the latest ClawSweeper review and work remains. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. status: 🛠️ actively grinding The PR author has acted after the latest ClawSweeper review and work remains. labels May 19, 2026
@Jerry-Xin Jerry-Xin force-pushed the fix/session-takeover-co-tenant-84071 branch from 4acf38e to 0a5c530 Compare May 19, 2026 16:07
@Jerry-Xin

Copy link
Copy Markdown
Contributor Author

Rebased on latest upstream/main (48acdd3, clean rebase, no conflicts).

@Jerry-Xin Jerry-Xin force-pushed the fix/session-takeover-co-tenant-84071 branch from 0a5c530 to 7795ae9 Compare May 19, 2026 18:05
@Jerry-Xin

Copy link
Copy Markdown
Contributor Author

Rebased on latest upstream/main (94d8391, clean rebase, no conflicts).

@Jerry-Xin Jerry-Xin force-pushed the fix/session-takeover-co-tenant-84071 branch from 7795ae9 to 36f42d2 Compare May 19, 2026 20:06
@Jerry-Xin Jerry-Xin force-pushed the fix/session-takeover-co-tenant-84071 branch 2 times, most recently from 5b9fc8d to a5305a8 Compare May 20, 2026 00:05
@Jerry-Xin

Copy link
Copy Markdown
Contributor Author

Rebased on latest upstream/main (79197b3, clean rebase, no conflicts).

@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

忻役 and others added 2 commits May 20, 2026 10:05
Add a process-scoped write epoch counter to acquireSessionWriteLock so
the embedded attempt controller can distinguish legitimate in-process
co-tenant writes (heartbeat, cron, channel ingress) from true external
session takeovers. The fingerprint fence now checks the epoch delta:
delta > 1 means other in-process callers acquired the lock, so the file
change is coordinated — update the fence baseline and continue.

Fixes openclaw#84071
@Jerry-Xin Jerry-Xin force-pushed the fix/session-takeover-co-tenant-84071 branch from a5305a8 to e14edf8 Compare May 20, 2026 02:05
@Jerry-Xin

Copy link
Copy Markdown
Contributor Author

Rebased on latest upstream/main (a002c41, clean rebase, no conflicts).

@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper applied the proposed close for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: S status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EmbeddedAttemptSessionTakeoverError fires on legitimate co-tenant writes to shared sessions (regression in 2026.5.17)

1 participant