fix(agents): abandon hung in-flight write lock on attempt cleanup (#84193)#84353
fix(agents): abandon hung in-flight write lock on attempt cleanup (#84193)#84353clawsweeper[bot] wants to merge 5 commits into
Conversation
|
Thanks for the context here. I swept through the related work, and this is now duplicate or superseded. Keep this PR open for the armed automerge path: current main still lacks the in-flight lock abandon path, the patch targets the implicated embedded attempt session-lock controller, and I found no blocking correctness or security finding. Canonical path: Close this PR as superseded by #84949. So I’m closing this here and keeping the remaining discussion on #84949. Review detailsBest possible solution: Close this PR as superseded by #84949. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection shows the current-main hung-run path only releases the reacquired lock in a finally block that never runs, and the PR includes a real-fs before/after reproduction for the same lock semantics. Is this the best way to solve the issue? Yes. Tracking in-flight reacquired locks inside the attempt session-lock controller and abandoning them only during cleanup is narrower than changing global lock stale policy, while preserving the existing transcript fence for later divergence detection. Security review: Security review cleared: The diff touches agent session-lock runtime and tests only; it does not add dependencies, workflows, install hooks, credential handling, or new external code execution. AGENTS.md: found and applied where relevant. What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 48adcb162c92. |
|
🦞🔧 Repair: kept the fix on this contributor branch instead of opening a replacement PR. Current state: exact-head review queued immediately; GitHub checks and the review verdict gate final merge. Automerge progress:
|
|
ClawSweeper PR egg ✨ Hatched: 🌱 uncommon Tiny Patch Peep Hatch commandComment Hatchability rules:
Rarity: 🌱 uncommon. What is this egg doing here?
|
|
🦞✅ Source: Why human review is needed: Recommended next action: I added |
|
@clawsweeper automerge |
|
@clawsweeper re-review |
|
@clawsweeper approve |
|
🦞👀 Command router queued. I will update this comment with the next step. |
c1b3143 to
83ed0d8
Compare
Pi auto-compaction wraps `session.compact()` under the session JSONL write lock via `installSessionExternalHookWriteLock`. When the compact() call hangs (post-run, after the outer abort/timeout has already fired), the reacquired lock is held by an `await run()` body that never settles, so the finally block that releases it never executes either. Until the session-write-lock watchdog forces a release at `maxHoldMs` (default runTimeout + compaction grace = many minutes), every later turn on the same session bounces off `SessionWriteLockTimeoutError` at the 60s acquire timeout. The only observed recovery was a Gateway restart. Track each reacquired write lock and abandon any still-held entries when `acquireForCleanup` runs. The transcript fence already guards against partial-write divergence on the next acquire, and the existing finally path stays correct (release is idempotent via the entry flag). Refs #84193
Add real-file-system integration test that exercises the actual
acquireSessionWriteLock + on-disk .jsonl.lock semantics (no mocks):
- Hold the lock through a stuck withSessionWriteLock and verify a
competing acquire on the same session file fails with
SessionWriteLockTimeoutError (matches the user-reported 60s
timeout on later Discord turns).
- Run attempt cleanup, verify the lock is released, the next
competing acquire succeeds in <5ms, and the .jsonl bytes remain
intact (no transcript tear).
- Verify the new stderr diagnostic line names the abandoned owner
pid and session file — what maintainers see in
`journalctl -u openclaw` after the bug fires.
Make abandonInFlightWriteLocks await its releases so the on-disk lock
file is gone before cleanup returns; without it, concurrent acquires
race the release and bounce off "file lock stale" until the watchdog
catches up.
83ed0d8 to
9b6f38b
Compare
|
ClawSweeper applied the proposed close for this PR.
|
Makes #84220 merge-ready for the ClawSweeper automerge loop.
The edit pass should inspect the live PR diff, review comments, and failing checks; rebase if needed; keep the contributor branch credited; and stop only when validation is green or an external blocker is proven.
ClawSweeper 🐠 replacement reef notes:
Inherited issue-closing references from the source PR:
Fixes #84193
Co-author credit kept:
fish notes: model gpt-5.5, reasoning high; reviewed against c1b3143.