Skip to content

fix(agents): release session write lock if fence read throws on prompt release#89811

Merged
clawsweeper[bot] merged 2 commits into
mainfrom
clawsweeper/automerge-openclaw-openclaw-89649
Jun 3, 2026
Merged

fix(agents): release session write lock if fence read throws on prompt release#89811
clawsweeper[bot] merged 2 commits into
mainfrom
clawsweeper/automerge-openclaw-openclaw-89649

Conversation

@clawsweeper

@clawsweeper clawsweeper Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Makes #89649 merge-ready for the ClawSweeper automerge loop.
The edit pass should inspect the live PR diff, review comments, and failing checks; rebase if needed; keep the contributor branch credited; and stop only when validation is green or an external blocker is proven.
Known failing checks:

ClawSweeper 🐠 replacement reef notes:

  • Repair fallback: GitHub rejected the repair branch push because it updates workflow files and the ClawSweeper app token does not have workflows permission

Co-author credit kept:

fish notes: model gpt-5.5, reasoning high; reviewed against 394d978.

spencer2211 and others added 2 commits June 3, 2026 11:37
…t release

releaseHeldLockWithFence (the embedded-attempt session-lock controller's
release-for-prompt path) cleared its in-memory `heldLock` reference and then
performed fence bookkeeping — readSessionFileFingerprint and
readSessionFileFenceSnapshot — before calling lock.release(). Both perform
filesystem I/O that can reject with a transient non-ENOENT error (e.g.
EIO/EMFILE under load). Because release() was not in a finally, such a throw
exited the function with `heldLock` already nulled and the underlying file
lock never released.

The lock then leaked on the live gateway process for the full maxHoldMs lease.
Since the controller holds the lock with maxHoldMs derived from the agent
timeout (~17 min) and the holder is the live pid (its /proc starttime matches
the lock payload), the stale-lock breaker correctly refuses to reclaim it — only
the watchdog TTL frees it. Meanwhile the attempt teardown still releases the
in-process session-file owner mutex, so the next interactive turn sails past the
mutex and blocks on the orphaned file lock, failing every turn with
SessionWriteLockTimeoutError for up to 17 minutes (observed: a Discord channel
dead 00:08–00:27).

Wrap the fence bookkeeping in try/finally so the file lock is always released
once heldLock is cleared, even if the fence reads throw. The original error
still propagates; only the lock-release invariant is made exception-safe.

Add regression coverage: a fence-read failure during releaseForPrompt must still
release the underlying lock.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@clawsweeper clawsweeper Bot added agents Agent runtime and tooling size: S clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge proof: sufficient ClawSweeper judged the real behavior proof convincing. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. clawsweeper Tracked by ClawSweeper automation labels Jun 3, 2026
@clawsweeper

clawsweeper Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Codex review: passed. Reviewed June 3, 2026, 7:51 AM ET / 11:51 UTC.

Summary
The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path.

PR surface: Source +6, Tests +27. Total +33 across 2 files.

Reproducibility: yes. source-reproducible with provided real-output proof: current main clears heldLock before fallible fence reads, and the source PR demonstrates the next interactive acquire timing out after an injected EIO. I did not run the harness locally because this review is read-only.

Review metrics: none identified.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Next step before merge

  • [P2] No repair lane is needed; the automerge-opted PR has no review findings, so exact-head checks and mergeability should gate it.

Security
Cleared: The diff only changes session-lock release ordering and a focused unit test; it does not touch workflows, dependency sources, secrets, package resolution, or publishing paths.

Review details

Best possible solution:

Land the narrow exception-safe lock-release fix after exact-head checks pass, while leaving the existing fence and cleanup ownership model unchanged.

Do we have a high-confidence way to reproduce the issue?

Yes, source-reproducible with provided real-output proof: current main clears heldLock before fallible fence reads, and the source PR demonstrates the next interactive acquire timing out after an injected EIO. I did not run the harness locally because this review is read-only.

Is this the best way to solve the issue?

Yes. The best narrow fix is to release the captured lock in the same ownership-transfer block; abort/dispose siblings do not do fallible fence reads before release, so the change belongs in releaseHeldLockWithFence.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against e5e6cf04a2f4.

Label changes

Label changes:

  • add status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Sufficient (live_output): The linked source PR includes after-fix real lock-stack output with a real on-disk lock and the next acquire succeeding immediately after the injected fence-read failure.
  • remove status: 👀 ready for maintainer look: Current PR status label is status: 🚀 automerge armed.

Label justifications:

  • P1: The bug can wedge an active agent/channel workflow until the session write-lock max-hold watchdog releases the orphaned lock.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Sufficient (live_output): The linked source PR includes after-fix real lock-stack output with a real on-disk lock and the next acquire succeeding immediately after the injected fence-read failure.
  • proof: sufficient: Contributor real behavior proof is sufficient. The linked source PR includes after-fix real lock-stack output with a real on-disk lock and the next acquire succeeding immediately after the injected fence-read failure.
Evidence reviewed

PR surface:

Source +6, Tests +27. Total +33 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 17 11 +6
Tests 1 27 0 +27
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 44 11 +33

What I checked:

Likely related people:

  • vincentkoc: git blame and git log -S show the current main releaseHeldLockWithFence implementation came from the recent agents lock-controller commit. (role: recent area contributor; confidence: medium; commits: 2c9297339817; files: src/agents/embedded-agent-runner/run/attempt.session-lock.ts, src/agents/embedded-agent-runner/run/attempt.session-lock.test.ts, src/agents/embedded-agent-runner/run/attempt.ts)
  • spencer2211: The closed source PR and local commit history show this person authored the focused fix and proof that this writable replacement PR carries forward. (role: source fix contributor; confidence: high; commits: d0b2b4ba3bc2; files: src/agents/embedded-agent-runner/run/attempt.session-lock.ts, src/agents/embedded-agent-runner/run/attempt.session-lock.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. and removed status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 3, 2026
@clawsweeper clawsweeper Bot merged commit 409d1a7 into main Jun 3, 2026
221 of 236 checks passed
@clawsweeper clawsweeper Bot deleted the clawsweeper/automerge-openclaw-openclaw-89649 branch June 3, 2026 11:51
@clawsweeper

clawsweeper Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

🦞✅
ClawSweeper merged this PR after the passing review.

Source: clawsweeper[bot]
Feedback: structured ClawSweeper verdict: pass (sha=394d9784371e9d004aca78aeb6dec1f9c4c9d055)
Merge status: merged by ClawSweeper automerge
Merged at: 2026-06-03T11:51:44Z
Merge commit: 409d1a713541

What merged:

  • The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path.
  • PR surface: Source +6, Tests +27. Total +33 across 2 files.
  • Reproducibility: yes. source-reproducible with provided real-output proof: current main clears heldLock be ... ire timing out after an injected EIO. I did not run the harness locally because this review is read-only.

Automerge notes:

  • PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp…

The automerge loop is complete.

Automerge progress:

  • 2026-06-03 11:51:28 UTC review passed 394d9784371e (structured ClawSweeper verdict: pass (sha=394d9784371e9d004aca78aeb6dec1f9c4c9d...)
  • 2026-06-03 11:51:47 UTC merged 394d9784371e (merged by ClawSweeper automerge)

matin added a commit to matin/openclaw that referenced this pull request Jun 3, 2026
…unsettled retained writes (#6)

A turn timeout aborts the run, but releaseHeldLockForAbort delegated to the
graceful fence release, whose waitForRetainedLockIdle is unbounded. A retained
transcript write pinned behind a hung provider stream (one that ignores abort,
e.g. a stalled streamGenerateContent) therefore held the session .jsonl lock
until the maxHoldMs watchdog (5-30 min), and every turn in between failed with
SessionWriteLockTimeoutError.

releaseHeldLockForAbort now races the graceful path against the abort settle
bound (OPENCLAW_EMBEDDED_ABORT_SETTLE_TIMEOUT_MS). When the bound expires the
underlying file lock is force-released and the controller poisoned via
takeoverDetected, so the aborted run cannot perform torn writes after losing
ownership. Retained writes that settle within the bound keep the existing
graceful fence semantics.

Prod incident: tulgey#225 (membrane 2026-06-01..03). Completes the lock-leak
family of openclaw#87278 / openclaw#88623 / openclaw#89811, which release on settled aborts but not on
aborts a hung provider call never lets settle.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request Jun 4, 2026
…t release (openclaw#89811)

Summary:
- The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path.
- PR surface: Source +6, Tests +27. Total +33 across 2 files.
- Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp…

Validation:
- ClawSweeper review passed for head 394d978.
- Required merge gates passed before the squash merge.

Prepared head SHA: 394d978
Review: openclaw#89811 (comment)

Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request Jun 4, 2026
…t release (openclaw#89811)

Summary:
- The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path.
- PR surface: Source +6, Tests +27. Total +33 across 2 files.
- Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp…

Validation:
- ClawSweeper review passed for head 394d978.
- Required merge gates passed before the squash merge.

Prepared head SHA: 394d978
Review: openclaw#89811 (comment)

Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request Jun 4, 2026
…t release (openclaw#89811)

Summary:
- The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path.
- PR surface: Source +6, Tests +27. Total +33 across 2 files.
- Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp…

Validation:
- ClawSweeper review passed for head 394d978.
- Required merge gates passed before the squash merge.

Prepared head SHA: 394d978
Review: openclaw#89811 (comment)

Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request Jun 4, 2026
…t release (openclaw#89811)

Summary:
- The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path.
- PR surface: Source +6, Tests +27. Total +33 across 2 files.
- Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp…

Validation:
- ClawSweeper review passed for head 394d978.
- Required merge gates passed before the squash merge.

Prepared head SHA: 394d978
Review: openclaw#89811 (comment)

Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request Jun 4, 2026
…t release (openclaw#89811)

Summary:
- The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path.
- PR surface: Source +6, Tests +27. Total +33 across 2 files.
- Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp…

Validation:
- ClawSweeper review passed for head 394d978.
- Required merge gates passed before the squash merge.

Prepared head SHA: 394d978
Review: openclaw#89811 (comment)

Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
SebTardif pushed a commit to SebTardif/openclaw that referenced this pull request Jun 4, 2026
…t release (openclaw#89811)

Summary:
- The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path.
- PR surface: Source +6, Tests +27. Total +33 across 2 files.
- Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp…

Validation:
- ClawSweeper review passed for head 394d978.
- Required merge gates passed before the squash merge.

Prepared head SHA: 394d978
Review: openclaw#89811 (comment)

Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
849261680 pushed a commit to 849261680/openclaw that referenced this pull request Jun 7, 2026
…t release (openclaw#89811)

Summary:
- The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path.
- PR surface: Source +6, Tests +27. Total +33 across 2 files.
- Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp…

Validation:
- ClawSweeper review passed for head 394d978.
- Required merge gates passed before the squash merge.

Prepared head SHA: 394d978
Review: openclaw#89811 (comment)

Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
wangmiao0668000666 pushed a commit to wangmiao0668000666/openclaw that referenced this pull request Jun 9, 2026
…t release (openclaw#89811)

Summary:
- The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path.
- PR surface: Source +6, Tests +27. Total +33 across 2 files.
- Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp…

Validation:
- ClawSweeper review passed for head 394d978.
- Required merge gates passed before the squash merge.

Prepared head SHA: 394d978
Review: openclaw#89811 (comment)

Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
…t release (openclaw#89811)

Summary:
- The PR makes prompt-release fence bookkeeping exception-safe so the session write lock is released even when fence reads throw, and adds a regression test for that path.
- PR surface: Source +6, Tests +27. Total +33 across 2 files.
- Reproducibility: yes. source-reproducible with provided real-output proof: current main clears `heldLock` be ... ire timing out after an injected `EIO`. I did not run the harness locally because this review is read-only.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(agents): release session write lock if fence read throws on promp…

Validation:
- ClawSweeper review passed for head 394d978.
- Required merge gates passed before the squash merge.

Prepared head SHA: 394d978
Review: openclaw#89811 (comment)

Co-authored-by: Spencer Fuller <spencer.p.fuller@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge clawsweeper Tracked by ClawSweeper automation P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: S status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant