fix(agents): tolerate in-process session writes during prompt release#84250
Conversation
|
Codex review: needs changes before merge. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. Current main's source path rejects any prompt-release fingerprint drift, and the PR body/comments include real filesystem proof for the false-takeover path while the remaining stale-baseline hole is source-reproducible from the new trust map logic. PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Risk before merge
Maintainer options:
Copy recommended automerge instructionNext step before merge Security Review findings
Review detailsBest possible solution: Keep explicit publication narrow, but advance the trusted baseline for proven same-process owned states and add stale-baseline regressions alongside the existing external-edit fail-closed cases. Do we have a high-confidence way to reproduce the issue? Yes. Current main's source path rejects any prompt-release fingerprint drift, and the PR body/comments include real filesystem proof for the false-takeover path while the remaining stale-baseline hole is source-reproducible from the new trust map logic. Is this the best way to solve the issue? No. The explicit-publication direction is plausible, but this implementation needs the trusted baseline repair before it is the narrow maintainable fix. Label changes:
Label justifications:
Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 23c58081d062. |
|
Cross-linking the related fixes I opened after tracing the live Discord duplicate-reply repro:
This PR appears to address a similar symptom from the locked-delivery-flush angle. In the live repro, the visible reply had already been delivered, but the tool result was later repaired as |
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
ClawSweeper PR egg 🔥 Warming up: real-behavior proof passed; findings, security review, or rank-up moves are still in progress. Hatch commandComment Hatchability rules:
What is this egg doing here?
|
|
Merged via squash.
Thanks @tianxiaochannel-oss88! |
Summary
promptActiveSessiontranscript context.Root Cause
The embedded attempt session fence only compared the session file's filesystem fingerprint saved at
releaseForPrompt()against the fingerprint observed after the prompt lock was reacquired. That correctly rejects unowned external edits, but it also rejects legitimate OpenClaw-managed transcript appends from another same-process controller/attempt that acquired the session write lock while the first attempt was released for model I/O.The fix keeps the fingerprint fence, but makes it ownership-aware and narrow: OpenClaw records a globally owned post-write fingerprint only for the exact transcript append publication section, routed through
runWithOwnedSessionTranscriptWritePublication(), and only when the pre-write fingerprint is already trusted by the process. The broaderrunWithOwnedSessionTranscriptWriteLock()now only provides lock serialization. A released attempt accepts the change only when the current file fingerprint exactly equals a newer owned fingerprint. Direct external edits, external edits followed by a broad locked append, external edits interleaved during cleanup, and external edits inside the broader owned transcript lock still fail closed.Fixes #84059.
Real behavior proof
EmbeddedAttemptSessionTakeoverErrorwhen another controller publishes an explicit owned transcript append for the same session file. External unowned file changes still throw, including an external edit interleaved during a broad same-process locked callback, an external edit interleaved while another controller holds the cleanup lock, and an external edit inside the broader owned transcript lock before the narrow append publication section.7008438d978c75a21e811df4e2dc041d9f9dfe71, then rebased/verified again through current PR headda570f3c09b47fce4ee8bd0c92fbb91d1d3d8ea4, macOS arm64, Node.jsv24.15.0, real temp session files on the local filesystem, real OpenClawacquireSessionWriteLock.EmbeddedAttemptSessionTakeoverError, proving a locked append does not launder an earlier external edit. The latest unit regressions also prove cleanup-lock release and broad owned transcript lock scopes do not launder external edits into globally owned fingerprints.Tests
After rebasing onto current
origin/main(5d775122c1):node scripts/run-vitest.mjs run --config test/vitest/vitest.agents-pi-embedded.config.ts src/agents/pi-embedded-runner/run/attempt.session-lock.test.ts-> 20 passednode scripts/run-vitest.mjs src/config/sessions/transcript.test.ts src/agents/pi-embedded-runner/run/attempt.session-lock.test.ts-> 3 files, 62 passednode scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core-test.tsbuildinfo-> passednode scripts/run-tsgo.mjs -p tsconfig.core.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core.tsbuildinfo-> passedgit diff --check-> passedSecurity-boundary update
publishOwnedWriteoption is forwarded tosessionLockController.withSessionWriteLock().withSessionWriteLock()callbacks still refresh their own controller fence, but no longer publish a globally owned fingerprint for other released controllers to accept.