fix(agents): release session write lock if fence read throws on prompt release#89811
Conversation
…t release releaseHeldLockWithFence (the embedded-attempt session-lock controller's release-for-prompt path) cleared its in-memory `heldLock` reference and then performed fence bookkeeping — readSessionFileFingerprint and readSessionFileFenceSnapshot — before calling lock.release(). Both perform filesystem I/O that can reject with a transient non-ENOENT error (e.g. EIO/EMFILE under load). Because release() was not in a finally, such a throw exited the function with `heldLock` already nulled and the underlying file lock never released. The lock then leaked on the live gateway process for the full maxHoldMs lease. Since the controller holds the lock with maxHoldMs derived from the agent timeout (~17 min) and the holder is the live pid (its /proc starttime matches the lock payload), the stale-lock breaker correctly refuses to reclaim it — only the watchdog TTL frees it. Meanwhile the attempt teardown still releases the in-process session-file owner mutex, so the next interactive turn sails past the mutex and blocks on the orphaned file lock, failing every turn with SessionWriteLockTimeoutError for up to 17 minutes (observed: a Discord channel dead 00:08–00:27). Wrap the fence bookkeeping in try/finally so the file lock is always released once heldLock is cleared, even if the fence reads throw. The original error still propagates; only the lock-release invariant is made exception-safe. Add regression coverage: a fence-read failure during releaseForPrompt must still release the underlying lock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Codex review: passed. Reviewed June 3, 2026, 7:51 AM ET / 11:51 UTC. Summary PR surface: Source +6, Tests +27. Total +33 across 2 files. Reproducibility: yes. source-reproducible with provided real-output proof: current main clears Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Next step before merge
Security Review detailsBest possible solution: Land the narrow exception-safe lock-release fix after exact-head checks pass, while leaving the existing fence and cleanup ownership model unchanged. Do we have a high-confidence way to reproduce the issue? Yes, source-reproducible with provided real-output proof: current main clears Is this the best way to solve the issue? Yes. The best narrow fix is to release the captured lock in the same ownership-transfer block; abort/dispose siblings do not do fallible fence reads before release, so the change belongs in AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against e5e6cf04a2f4. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +6, Tests +27. Total +33 across 2 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
🦞✅ Source: What merged:
Automerge notes:
The automerge loop is complete. Automerge progress:
|
…unsettled retained writes (#6) A turn timeout aborts the run, but releaseHeldLockForAbort delegated to the graceful fence release, whose waitForRetainedLockIdle is unbounded. A retained transcript write pinned behind a hung provider stream (one that ignores abort, e.g. a stalled streamGenerateContent) therefore held the session .jsonl lock until the maxHoldMs watchdog (5-30 min), and every turn in between failed with SessionWriteLockTimeoutError. releaseHeldLockForAbort now races the graceful path against the abort settle bound (OPENCLAW_EMBEDDED_ABORT_SETTLE_TIMEOUT_MS). When the bound expires the underlying file lock is force-released and the controller poisoned via takeoverDetected, so the aborted run cannot perform torn writes after losing ownership. Retained writes that settle within the bound keep the existing graceful fence semantics. Prod incident: tulgey#225 (membrane 2026-06-01..03). Completes the lock-leak family of openclaw#87278 / openclaw#88623 / openclaw#89811, which release on settled aborts but not on aborts a hung provider call never lets settle. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t release (openclaw#89811) Summary: - The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path. - PR surface: Source +6, Tests +27. Total +33 across 2 files. - Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp… Validation: - ClawSweeper review passed for head 394d978. - Required merge gates passed before the squash merge. Prepared head SHA: 394d978 Review: openclaw#89811 (comment) Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…t release (openclaw#89811) Summary: - The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path. - PR surface: Source +6, Tests +27. Total +33 across 2 files. - Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp… Validation: - ClawSweeper review passed for head 394d978. - Required merge gates passed before the squash merge. Prepared head SHA: 394d978 Review: openclaw#89811 (comment) Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…t release (openclaw#89811) Summary: - The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path. - PR surface: Source +6, Tests +27. Total +33 across 2 files. - Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp… Validation: - ClawSweeper review passed for head 394d978. - Required merge gates passed before the squash merge. Prepared head SHA: 394d978 Review: openclaw#89811 (comment) Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…t release (openclaw#89811) Summary: - The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path. - PR surface: Source +6, Tests +27. Total +33 across 2 files. - Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp… Validation: - ClawSweeper review passed for head 394d978. - Required merge gates passed before the squash merge. Prepared head SHA: 394d978 Review: openclaw#89811 (comment) Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…t release (openclaw#89811) Summary: - The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path. - PR surface: Source +6, Tests +27. Total +33 across 2 files. - Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp… Validation: - ClawSweeper review passed for head 394d978. - Required merge gates passed before the squash merge. Prepared head SHA: 394d978 Review: openclaw#89811 (comment) Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…t release (openclaw#89811) Summary: - The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path. - PR surface: Source +6, Tests +27. Total +33 across 2 files. - Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp… Validation: - ClawSweeper review passed for head 394d978. - Required merge gates passed before the squash merge. Prepared head SHA: 394d978 Review: openclaw#89811 (comment) Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…t release (openclaw#89811) Summary: - The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path. - PR surface: Source +6, Tests +27. Total +33 across 2 files. - Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp… Validation: - ClawSweeper review passed for head 394d978. - Required merge gates passed before the squash merge. Prepared head SHA: 394d978 Review: openclaw#89811 (comment) Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…t release (openclaw#89811) Summary: - The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path. - PR surface: Source +6, Tests +27. Total +33 across 2 files. - Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp… Validation: - ClawSweeper review passed for head 394d978. - Required merge gates passed before the squash merge. Prepared head SHA: 394d978 Review: openclaw#89811 (comment) Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
…t release (openclaw#89811) Summary: - The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path. - PR surface: Source +6, Tests +27. Total +33 across 2 files. - Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp… Validation: - ClawSweeper review passed for head 394d978. - Required merge gates passed before the squash merge. Prepared head SHA: 394d978 Review: openclaw#89811 (comment) Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com> Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com> Approved-by: takhoffman Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Makes #89649 merge-ready for the ClawSweeper automerge loop.
The edit pass should inspect the live PR diff, review comments, and failing checks; rebase if needed; keep the contributor branch credited; and stop only when validation is green or an external blocker is proven.
Known failing checks:
ClawSweeper 🐠 replacement reef notes:
Co-author credit kept:
fish notes: model gpt-5.5, reasoning high; reviewed against 394d978.