Skip to content

fix(agents): stop false-positive session-takeover on runner's own transcript appends#84046

Closed
kays0x wants to merge 1 commit into
openclaw:mainfrom
kays0x:kevin/fix-session-takeover-falsepositive
Closed

fix(agents): stop false-positive session-takeover on runner's own transcript appends#84046
kays0x wants to merge 1 commit into
openclaw:mainfrom
kays0x:kevin/fix-session-takeover-falsepositive

Conversation

@kays0x

@kays0x kays0x commented May 19, 2026

Copy link
Copy Markdown

Summary

The embedded attempt session-lock fence uses a stat()-based fingerprint (dev, ino, size, mtimeNs, ctimeNs, birthtimeNs) to detect "another lane queued a new user turn against this session" while the runner releases its coarse lock for the LLM prompt window. The fingerprint is a proxy that overfires: every transcript-append the runner itself performs during the released window (appendSessionTranscriptMessageLocked writes tool calls, tool results, and assistant replies via acquireSessionWriteLock(..., allowReentrant: true)) changes the file's size/mtimeNs/ctimeNs without going through refreshSessionFileFence. Long DM turns with tool calls then trip assertSessionFileFence and throw EmbeddedAttemptSessionTakeoverError after the reply has already been delivered, surfacing a misleading "Agent failed before reply" message on top of a successful turn.

Keep the stat fingerprint as the fast path. On mismatch, only tolerate the runner's append-only assistant/tool transcript shape; treat everything else as a real takeover. The verification requires ALL of:

  • dev/ino match: no atomic replacement onto a new inode.
  • birthtimeNs matches: catches the unlink+recreate-same-inode case (common on tmpfs/ext4 with rapid recreation, where dev and ino are reused but the inode itself is fresh).
  • size grew: file is append-only between fence-set and assert.
  • stat/read consistency: both snapshotSessionFileFence and verifyAppendOnlyRunnerOwnedExtension run a post-read stat and compare every fingerprint field (size, dev, ino, mtimeNs, ctimeNs, birthtimeNs) against the pre-read fingerprint. Any mismatch fails closed: the snapshot returns prefixHashHex: null (forces the next assertion to treat the file as taken over) and the slow path rejects directly. Catches the same-size in-place rewrite race where byte length alone would not detect drift.
  • bytes [0, fenceSize) hash identical to the fenced prefix hash: no in-place rewrite of earlier transcript content.
  • bytes [fenceSize, currentSize) are canonical session entries: every appended line must satisfy isSessionEntry from transcript-file-state.ts (record shape, type, id, parentId, timestamp, plus the per-type message-shape contract via isAgentMessage), with role narrowed to assistant, toolResult, or bashExecution. Non-object parsed values, user-role entries, custom entries, branch_summary entries, and any entry missing the outer base fields remain a real takeover signal.

Anything else (replacement, shrink, in-place rewrite at any size, new user turn, compaction, a malformed appended entry, a non-object tail value, or a stat/read inconsistency window) remains a real takeover. The fence snapshot captures the prefix SHA-256 alongside the fingerprint at releaseForPrompt and refreshSessionFileFence so the slow path can verify rigorously without keeping a file handle open.

The slow-path tail validation delegates to the canonical isSessionEntry whole-entry validator (now export-visible from transcript-file-state.ts, signature widened to unknown with an internal isRecord guard so non-record JSON values fail closed). No local mirror of the persisted message contract is maintained: if the upstream FileEntry/SessionEntry contract changes, the fence's accept-set tracks it automatically.

The nextSnapshot returned from a successful slow-path verification hashes the full new buffer (not the pre-fence prefix). Otherwise, when the guarded operation throws between assertSessionFileFence and refreshSessionFileFence, the stored snapshot would describe the new size but hold the old prefix hash, and the next legitimate runner append would false-positive on prefix-hash mismatch.

Lock-held invariant

Both snapshotSessionFileFence call sites run while the session write lock is held: releaseForPrompt (snapshot at L452 before lock.release() at L454) and refreshSessionFileFence (called inside withSessionWriteLock's runWithLock wrapped by activeWriteLock.run(lock, ...)). Canonical transcript writers (appendSessionTranscriptMessageLocked, migrateLinearTranscriptToParentLinked, ensureTranscriptHeader) all acquire that same lock. The stat/read consistency guard above is belt-and-suspenders for this invariant: it defends against multi-process scenarios (a second gateway against the same session file), an external editor, or an internal bug that bypasses the lock.

Closes #83436.
Refs #83615 (this PR addresses the EmbeddedAttemptSessionTakeoverError surface; the issue also tracks unrelated kimi-k2.6 schema and DNS-fetch failures).
Refs #84059 (active multi-environment hub for this error: Feishu/Slack/WebChat across macOS APFS, Linux ext4, WSL2, Docker; all are the general internal-write path this PR covers, not the delivery-mirror path addressed by #84437).

Real behavior proof

Behavior addressed: Embedded runner false-positive EmbeddedAttemptSessionTakeoverError fires after long Telegram DM turns whose reply was already delivered, surfacing a misleading "Agent failed before reply" message on top of a successful run.

Real environment tested: OpenClaw 2026.5.19-beta.1, Linux x86_64, Node 24.13.0. Long-running Telegram DM session against claude-opus-4-7 (~3.5 min turn with multiple exec and message tool calls), reproduced reliably before this patch. Production behavior probed via runtime-patched dist/selection-BpjGe-Y0.js carrying the same fix semantics, then long DM runs (41s to 122s) exercised through the Telegram channel post-restart.

Exact steps or command run after this patch:

node scripts/run-vitest.mjs src/agents/pi-embedded-runner/run/attempt.session-lock.test.ts src/agents/pi-embedded-runner/transcript-file-state.test.ts
CI=true pnpm check:changed --staged=false
openclaw gateway restart

Evidence after fix:

Pre-patch failure mode, redacted runtime log from gateway (/tmp/openclaw/openclaw-2026-05-19.log):

03:30:06.971  embedded run agent end: runId=bbd7d487 isError=false
03:30:07.008  embedded run prompt end: runId=bbd7d487 sessionId=<session-id> durationMs=217236
03:30:07.054  lane task error: lane=main durationMs=219867 error="EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released: /home/.../agents/main/sessions/<session-id>.jsonl"
03:30:07.071  Embedded agent failed before reply: session file changed while embedded prompt lock was released: /home/.../<session-id>.jsonl

Note isError=false at 03:30:06.971: the assistant turn and message tool delivery both succeeded, followed 47ms later by the takeover throw, surfacing as a misleading Telegram error on top of a delivered reply.

Post-patch behavior on the same gateway, four long DM runs after gateway restart at 03:46 EDT, every one delivering cleanly with zero takeover errors (redacted runtime log):

03:51:18  embedded run prompt end: runId=62395ffd  durationMs=122344  (no takeover)
03:52:39  embedded run prompt end: runId=67e21652  durationMs=41345   (no takeover)
04:05:05  embedded run prompt end: runId=a5994980  durationMs=101810  (no takeover)
04:07:34  embedded run prompt end: runId=a837440b  durationMs=94395   (no takeover)

The post-patch window contains only two "Embedded agent failed before reply" entries (at 04:05:06 and 04:07:35), both with cause LLM request timed out, unrelated to the takeover path. Zero SessionTakeoverError occurrences post-patch.

Observed result after fix: Existing acceptance test exercises all three runner-owned roles (assistant, toolResult, bashExecution) with their canonical persisted shapes. Rejection tests cover: another owner queueing a new user turn, a runner-owned role with malformed message contents, a runner-owned message entry missing its outer id, a non-message entry (e.g. custom/branch_summary/compaction), a tail line that parses to a non-object value (e.g. literal null), in-place rewrite (rejected via prefix hash mismatch), unlink+recreate-same-inode replacement (rejected via birthtimeNs mismatch), a same-size in-place rewrite landing during the snapshot's stat-to-read window (rejected via post-read re-stat), and the same race during the slow-path verification's read (rejected via the same guard). A fence-consistency test covers the post-throw path: after a guarded operation throws between assert and refresh, the next valid runner-owned append must not false-positive on prefix-hash mismatch. Targeted Vitest output:

Test Files  4 passed (4)
     Tests  82 passed (82)
  Duration  3.86s

check:changed --staged=false is clean across the full repo lane (lint, typecheck, import-cycles, dependency guards, build artifacts, security/quality lanes).

What was not tested: Multi-process write contention on the same sessionFile from a different gateway process (no test infrastructure for cross-process locks in this surface). The prefix-hash check is O(fenceSize) I/O per fence-set; in production sessions seen so far (~150 KB and below) this is sub-millisecond. Both snapshot producers do one extra fs.stat (sub-millisecond on local disk) to enforce the stat/read consistency guard.

Notes

  • Adds three helpers near readSessionFileFingerprint: FenceSnapshot type, snapshotSessionFileFence (used at releaseForPrompt and refreshSessionFileFence), and verifyAppendOnlyRunnerOwnedExtension (used in the slow path of assertSessionFileFence). No new module-level dependencies beyond node:crypto's createHash.
  • Replaces one piece of controller state (fenceFingerprint: SessionFileFingerprint | undefined) with fenceSnapshot: FenceSnapshot | undefined. No new long-lived resources (no held file handles).
  • Exports isSessionEntry from transcript-file-state.ts so the fence's slow path can delegate the whole-entry contract to the canonical validator. The signature is widened to unknown with an internal isRecord guard; non-record JSON values fail closed at the canonical boundary instead of needing per-callsite type-shape checks. isAgentMessage is not exported (the fence reaches it transitively through isSessionEntry).

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 19, 2026
@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Codex review: found issues before merge.

Latest ClawSweeper review: 2026-05-20 20:05 UTC / May 20, 2026, 4:05 PM ET.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The branch adds a snapshot/hash slow path to the embedded session-file fence so append-only canonical runner-owned transcript messages can be accepted without tripping takeover, and exports isSessionEntry for tail validation.

Reproducibility: yes. Source inspection shows current main throws on any session-file stat mismatch after prompt-lock release, and the supplied before/after gateway logs plus production validation show the failure and improvement on long multi-tool turns.

PR rating
Overall: 🦐 gold shrimp
Proof: 🦞 diamond lobster
Patch quality: 🦐 gold shrimp
Summary: The proof is strong, but the patch needs one strictness fix plus a rebase and maintainer session-lock design decision before it is quality-ready.

Rank-up moves:

What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (logs): The PR body and production validation comment include redacted before/after gateway logs showing takeover failures before the patch and clean long multi-tool turns after the patched runtime.

Risk before merge

  • The PR intentionally relaxes a fail-closed session-state fence; an acceptance bug could let the controller advance over a non-runner-owned transcript mutation.
  • The current implementation reads and hashes session-file prefixes/full files in the prompt-lock path, so very large transcripts still need maintainer acceptance for latency and availability impact.
  • The branch conflicts with current main after fix(pi): keep message-tool delivery in session lock #84437, and maintainers need to choose whether the canonical fix is fence tolerance or extending the owned-write context to all runner appends.

Maintainer options:

  1. Rebase and tighten the fence path
    If maintainers want the fence-tolerance design, rebase on current main and reject empty or interior-blank appended tail segments before merge.
  2. Extend the owned-write model instead
    If maintainers prefer the design from fix(pi): keep message-tool delivery in session lock #84437, close this branch after a follow-up routes the general runner-owned appends through that context.
  3. Accept the hashing cost explicitly
    Session-state owners can still accept the O(file-size) hash/read cost if they judge the production session sizes and failure mode justify it.

Next step before merge
Maintainers should choose the session-lock strategy and request a rebase; this is not a safe repair lane until that direction is settled.

Security
Cleared: No dependency, workflow, secret, permission, or supply-chain change was found; the session-state acceptance boundary issue is covered as a functional finding.

Review findings

  • [P2] Reject empty transcript tail lines in the slow path — src/agents/pi-embedded-runner/run/attempt.session-lock.ts:268-270
Review details

Best possible solution:

Choose one canonical session-lock model, then land a rebased implementation that either strictly validates runner-owned append-only tails or routes all runner-owned writes through the owned-write context without leaving competing paths.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection shows current main throws on any session-file stat mismatch after prompt-lock release, and the supplied before/after gateway logs plus production validation show the failure and improvement on long multi-tool turns.

Is this the best way to solve the issue?

Not yet. The approach is plausible and well covered, but the current head needs a rebase against the owned-write context work and should fail closed on empty appended tail segments before merge.

Label changes:

  • add rating: 🦐 gold shrimp: Current PR rating is 🦐 gold shrimp because proof is 🦞 diamond lobster, patch quality is 🦐 gold shrimp, and The proof is strong, but the patch needs one strictness fix plus a rebase and maintainer session-lock design decision before it is quality-ready.
  • add status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (logs): The PR body and production validation comment include redacted before/after gateway logs showing takeover failures before the patch and clean long multi-tool turns after the patched runtime.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦐 gold shrimp, so this older rating label is no longer current.
  • remove status: 👀 ready for maintainer look: Current PR status label is status: ⏳ waiting on author.

Label justifications:

  • P1: The PR addresses an urgent regression where valid long agent/channel turns can fail with EmbeddedAttemptSessionTakeoverError after or during delivery.
  • merge-risk: 🚨 session-state: Merging changes which transcript mutations are accepted as non-takeover while the embedded prompt lock is released.
  • merge-risk: 🚨 availability: Merging adds file read and hash work to a hot session-lock path, which could affect large transcript latency.
  • rating: 🦐 gold shrimp: Current PR rating is 🦐 gold shrimp because proof is 🦞 diamond lobster, patch quality is 🦐 gold shrimp, and The proof is strong, but the patch needs one strictness fix plus a rebase and maintainer session-lock design decision before it is quality-ready.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (logs): The PR body and production validation comment include redacted before/after gateway logs showing takeover failures before the patch and clean long multi-tool turns after the patched runtime.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body and production validation comment include redacted before/after gateway logs showing takeover failures before the patch and clean long multi-tool turns after the patched runtime.

Full review comments:

  • [P2] Reject empty transcript tail lines in the slow path — src/agents/pi-embedded-runner/run/attempt.session-lock.ts:268-270
    The slow path skips every empty segment in the appended tail, so an external append of a blank line, or a blank separator between entries, passes verification and advances the fence even though no canonical runner-owned SessionEntry was written. That weakens the takeover guard compared with current main; only the final split artifact from a normal trailing newline should be ignored, and an all-empty or interior-empty tail should fail closed.
    Confidence: 0.78

Overall correctness: patch is incorrect
Overall confidence: 0.82

Acceptance criteria:

  • node scripts/run-vitest.mjs src/agents/pi-embedded-runner/run/attempt.session-lock.test.ts src/agents/pi-embedded-runner/transcript-file-state.test.ts
  • node scripts/crabbox-wrapper.mjs run ... --shell -- "pnpm check:changed"

What I checked:

Likely related people:

  • amknight: Authored the merged prompt-lock release and session-file fence work in 8a060b2904d4e78bc2c5139c72b191bc88196c1b, the central behavior this PR changes. (role: introduced behavior; confidence: high; commits: 8a060b2904d4; files: src/agents/pi-embedded-runner/run/attempt.session-lock.ts, src/agents/pi-embedded-runner/run/attempt.session-lock.test.ts)
  • vincentkoc: Authored the merged malformed transcript-state validation work in 3918d695871801073013d928a07646a316f41541, which defines the isSessionEntry contract this PR exports and reuses. (role: adjacent validator owner; confidence: high; commits: 3918d6958718; files: src/agents/pi-embedded-runner/transcript-file-state.ts, src/agents/pi-embedded-runner/transcript-file-state.test.ts)
  • dr00-eth: Credited on the recently landed replacement PR that added the owned transcript write context for message-tool delivery, the adjacent design now intersecting with this PR. (role: recent adjacent contributor; confidence: medium; commits: 65030f31649b; files: src/config/sessions/transcript-write-context.ts, src/agents/pi-embedded-runner/run/attempt.ts, src/agents/pi-embedded-runner/run/attempt.session-lock.test.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 1a7669bc63a0.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. mantis: telegram-visible-proof Mantis should capture Telegram visible proof. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels May 19, 2026
kays0x added a commit to kays0x/openclaw that referenced this pull request May 19, 2026
…unner-owned shape

ClawSweeper review on openclaw#84046 flagged the prior count-only check as too
broad: concurrent transcript rewrites, file replacements, branch
changes, and same-user-count mutations could all keep the user-message
count unchanged while changing session state. Tighten the fence
disambiguation so a stat-fingerprint mismatch is only tolerated when
ALL of these hold:

- dev/ino match (no atomic replacement onto a new inode)
- birthtimeNs matches (no unlink+recreate-same-inode case; common on
  tmpfs/ext4 with rapid recreation, where dev and ino are reused)
- size grew (file is append-only between fence-set and assert)
- bytes [0, fenceSize) hash identical to the fenced prefix hash (no
  in-place rewrite of earlier transcript content)
- bytes [fenceSize, currentSize) contain only message entries with role
  assistant or tool (no new user-role turn, no malformed tail)

Anything else — replacement, shrink, in-place rewrite, new user turn,
or compaction-style mutation — remains a real takeover. Snapshot now
captures the prefix hash alongside the fingerprint at releaseForPrompt
and refreshSessionFileFence so the slow path can verify rigorously.

Adds tests covering in-place rewrite and unlink+recreate replacement
(both with prefix bytes coincidentally matching and roles that would
otherwise look runner-owned); existing append-only and new-user-turn
tests preserved.

Refs: openclaw#83436, openclaw#83615
Addresses ClawSweeper P1 finding on prior revision of openclaw#84046.
@kays0x

kays0x commented May 19, 2026

Copy link
Copy Markdown
Author

@clawsweeper re-review

Revision addresses your P1 finding from the prior review: the fence now requires append-only growth, identical inode + birthtime + prefix-hash, and assistant/tool-only tail roles. In-place rewrites, file replacements, and same-user-count mutations all trip takeover.

Real behavior proof added to PR body (before/after gateway log evidence from a 217s DM run that previously false-tripped). 16/16 tests passing, check:changed clean.

@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 19, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 19, 2026
@kays0x

kays0x commented May 19, 2026

Copy link
Copy Markdown
Author

@clawsweeper re-review

Revision addresses both P1 findings from your last review:

  • P1.1 (role allowlist): dropped tool; tail now accepts only assistant, toolResult, bashExecution (the runner-owned roles per isAgentMessage in transcript-file-state.ts:182).
  • P1.2 (non-message tail entries): removed the continue on type !== "message"; non-message entries (custom, branch_summary, compaction, session) now fail closed.

Tests: existing acceptance test exercises all three runner-owned roles with their actual schema; new test asserts a type: custom tail entry trips takeover. 17/17 passing.

kays0x added a commit to kays0x/openclaw that referenced this pull request May 19, 2026
…t non-message tail entries

ClawSweeper re-review on openclaw#84046 flagged two P1 issues on the prior
revision of the append-only fence:

- The role allowlist used 'tool', but OpenClaw persists tool outputs
  as message.role === 'toolResult' (see
  src/agents/pi-embedded-runner/transcript-file-state.ts:182). Real
  tool-using turns would still false-trip the takeover. Update the
  allowlist to match isAgentMessage's runner-owned roles: assistant,
  toolResult, bashExecution.

- The slow path silently 'continue'd through non-message tail entries
  (custom, branch_summary, compaction, session). That let a different
  owner mutate session state during the released-lock window without
  a new user turn. Fail closed: any non-message entry in the tail
  rejects the fence.

Test updates: existing acceptance test now appends assistant +
toolResult + bashExecution (the three valid persisted runner-owned
roles); new test asserts a 'type: custom' tail entry trips takeover
even though it grew append-only.

Refs: openclaw#83436, openclaw#83615
Addresses ClawSweeper P1 findings on second revision of openclaw#84046.
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 19, 2026
@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. merge-risk: 🚨 automation 🚨 May affect CI, automerge, proof capture, label sync, or maintainer automation. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. labels May 19, 2026
@kays0x kays0x force-pushed the kevin/fix-session-takeover-falsepositive branch from 464cb15 to 42ff382 Compare May 19, 2026 19:26
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 19, 2026
@kays0x

kays0x commented May 19, 2026

Copy link
Copy Markdown
Author

Addressing both your P1 (null/non-object tail) and a related fence-consistency issue surfaced by a separate review.

Changes:

  1. isSessionEntry signature widened to unknown with an internal isRecord guard. Non-record JSON values (null, scalars, arrays) now fail closed at the canonical validator instead of needing a separate type-shape check at the fence callsite. hasSessionEntryBase follows the same shape: takes Record<string, unknown>, no internal cast.

  2. verifyAppendOnlyRunnerOwnedExtension now hashes the full new buffer for nextSnapshot.prefixHashHex instead of reusing the pre-fence prefix hash. Without this, a guarded operation that throws between assertSessionFileFence and refreshSessionFileFence would leave the snapshot describing the new size but holding the old prefix hash, and the next legitimate runner append would false-positive on prefix-hash mismatch.

  3. Added a TOCTOU consistency guard: the slow path now refuses when buffer.byteLength !== Number(fingerprint.size). Otherwise a concurrent append between fs.stat and fs.readFile would let us bless a snapshot that describes a different file than the one we just hashed.

Tests added:

  • A tail line that parses to a non-object value (literal null) trips takeover (validates the canonical-validator change).
  • A guarded operation that throws after assertion does not poison the fence: a subsequent runner-owned append must succeed (validates the full-buffer snapshot hash and the fence-state restore path).

78 tests passing across 4 files; check:changed --staged=false clean.

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 19, 2026
@kays0x kays0x force-pushed the kevin/fix-session-takeover-falsepositive branch from 42ff382 to ac07599 Compare May 19, 2026 20:16
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 19, 2026
@kays0x

kays0x commented May 19, 2026

Copy link
Copy Markdown
Author

Addressing the snapshot stat/read consistency finding, and stating the contract explicitly so we can draw the line on hypothetical filesystem races below the lock-held invariant.

Change:

Both snapshotSessionFileFence and verifyAppendOnlyRunnerOwnedExtension now perform a post-read readSessionFileFingerprint and compare every field (size, dev, ino, mtimeNs, ctimeNs, birthtimeNs) against the pre-read fingerprint. On mismatch the snapshot returns prefixHashHex: null and the slow path rejects directly. This catches the same-size in-place rewrite race where the byte-length guard would not detect drift.

Tests added:

  • A same-size in-place rewrite during snapshotSessionFileFence's stat/read window forces the next withSessionWriteLock to fail closed as takeover (prefixHashHex: null from the snapshot trips the verification).
  • A same-size in-place rewrite during the slow-path verification's stat/read window forces the verification to reject directly.

Both tests interpose between stat and read by spying fs.readFile once with a side-effect-then-delegate implementation that atomically rewrites the file at identical byte length before returning the new content.

82 tests passing across 4 files; check:changed --staged=false clean.

Lock-held invariant (now documented in the PR summary): both snapshotSessionFileFence call sites run while the session write lock is held (releaseForPrompt snapshots at attempt.session-lock.ts L452 before lock.release() at L454; refreshSessionFileFence runs inside withSessionWriteLock's runWithLock wrapper at L469). Canonical transcript writers (appendSessionTranscriptMessageLocked, migrateLinearTranscriptToParentLinked, ensureTranscriptHeader) all acquire that same lock. The stat/read consistency guard above is belt-and-suspenders for this invariant: it defends against multi-process gateway scenarios, external editors, or an internal bug that bypasses the lock. Further filesystem TOCTOU classes below this layer fall under "hypothetical malformed input" per AGENTS.md:40 and are not in scope for this PR.

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 19, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 20, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 20, 2026
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🔥 Warming up: real-behavior proof passed; findings, security review, or rank-up moves are still in progress.

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.
What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@steipete

Copy link
Copy Markdown
Contributor

Closing this as stale against the current runner layout.

This PR edits the old src/agents/pi-embedded-runner/... path, which is no longer present on current main. The same session-fence/takeover family still has an active current-path candidate in #88348, which touches the live embedded-agent-runner code instead.

Thanks for the original investigation; the fix needs to happen on the current runner surface now.

@steipete steipete closed this May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. size: M status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants