Skip to content

fix(agents): abandon hung in-flight write lock on attempt cleanup (#84193)#84353

Closed
clawsweeper[bot] wants to merge 5 commits into
mainfrom
clawsweeper/automerge-openclaw-openclaw-84220
Closed

fix(agents): abandon hung in-flight write lock on attempt cleanup (#84193)#84353
clawsweeper[bot] wants to merge 5 commits into
mainfrom
clawsweeper/automerge-openclaw-openclaw-84220

Conversation

@clawsweeper

@clawsweeper clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Makes #84220 merge-ready for the ClawSweeper automerge loop.
The edit pass should inspect the live PR diff, review comments, and failing checks; rebase if needed; keep the contributor branch credited; and stop only when validation is green or an external blocker is proven.

ClawSweeper 🐠 replacement reef notes:

  • Repair fallback: GitHub rejected the repair branch push because it updates workflow files and the ClawSweeper app token does not have workflows permission

Inherited issue-closing references from the source PR:
Fixes #84193

Co-author credit kept:

fish notes: model gpt-5.5, reasoning high; reviewed against c1b3143.

@clawsweeper clawsweeper Bot added agents Agent runtime and tooling size: M clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge proof: supplied External PR includes structured after-fix real behavior proof. proof: sufficient ClawSweeper judged the real behavior proof convincing. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. clawsweeper Tracked by ClawSweeper automation labels May 19, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: supplied External PR includes structured after-fix real behavior proof. label May 19, 2026
@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the context here. I swept through the related work, and this is now duplicate or superseded.

Keep this PR open for the armed automerge path: current main still lacks the in-flight lock abandon path, the patch targets the implicated embedded attempt session-lock controller, and I found no blocking correctness or security finding.

Canonical path: Close this PR as superseded by #84949.

So I’m closing this here and keeping the remaining discussion on #84949.

Review details

Best possible solution:

Close this PR as superseded by #84949.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection shows the current-main hung-run path only releases the reacquired lock in a finally block that never runs, and the PR includes a real-fs before/after reproduction for the same lock semantics.

Is this the best way to solve the issue?

Yes. Tracking in-flight reacquired locks inside the attempt session-lock controller and abandoning them only during cleanup is narrower than changing global lock stale policy, while preserving the existing transcript fence for later divergence detection.

Security review:

Security review cleared: The diff touches agent session-lock runtime and tests only; it does not add dependencies, workflows, install hooks, credential handling, or new external code execution.

AGENTS.md: found and applied where relevant.

What I checked:

  • linked superseding PR: fix(agents): bound embedded compaction write locks #84949 (fix(agents): bound embedded compaction write locks) is merged at 2026-05-22T17:30:39Z.
  • cluster evidence: the durable review links that PR in the work cluster or recommended risk path.
  • no human follow-up: live comments and timeline hydrated by apply contain no non-automation activity after the ClawSweeper review.

Likely related people:

  • steipete: History sampling shows Peter Steinberger as the dominant contributor across the embedded attempt and session-lock files, including recent attempt/runtime cleanup work. (role: adjacent owner; confidence: medium; commits: a374c3a5bfd5, 235cdb3f81cc, 50fcdb36a841; files: src/agents/pi-embedded-runner/run/attempt.ts, src/agents/pi-embedded-runner/run/attempt.session-lock.ts, src/agents/session-write-lock.ts)
  • vincentkoc: Vincent Koc recently changed the embedded attempt abort path and appears frequently in the attempt.ts history around agent runtime reliability. (role: recent area contributor; confidence: medium; commits: a122d804dda8, 38de89641999, fcae3bf9433b; files: src/agents/pi-embedded-runner/run/attempt.ts)
  • luoyanglang: The merged related PR at fix(agents): bound embedded compaction write locks #84949 adjusted embedded compaction write-lock max-hold behavior, which is adjacent to this cleanup-abandon fix. (role: recent adjacent fix owner; confidence: medium; commits: 46de078b2a33; files: src/agents/pi-embedded-runner/run/attempt.ts)
  • Sebastien Tardif: Current checkout blame for the session-lock controller and cleanup call path points at the grafted current-main snapshot commit, so this is a low-confidence routing signal rather than original authorship. (role: current-line provenance; confidence: low; commits: fe8d99d42129; files: src/agents/pi-embedded-runner/run/attempt.session-lock.ts, src/agents/pi-embedded-runner/run/attempt.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against 48adcb162c92.

@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor Author

🦞🔧
ClawSweeper applied a repair to this PR branch.

Repair: kept the fix on this contributor branch instead of opening a replacement PR.
Validation: node scripts/run-vitest.mjs src/agents/pi-embedded-runner/run/attempt.session-lock.compaction-leak.test.ts src/agents/pi-embedded-runner/run/attempt.session-lock.compaction-leak.integration.test.ts; pnpm check:changed; pnpm lint; pnpm check:test-types
Updated head: 9b6f38b4d141
Run: https://github.com/openclaw/clawsweeper/actions/runs/26421027636

Current state: exact-head review queued immediately; GitHub checks and the review verdict gate final merge.

Automerge progress:

  • 2026-05-25 21:41:13 UTC review queued 83ed0d887d13 (after repair)
  • 2026-05-20 03:22:55 UTC review queued 83ed0d887d13 (queued)
  • 2026-05-25 21:50:36 UTC review passed 83ed0d887d13 (structured ClawSweeper verdict: pass (sha=83ed0d887d13800dde46c7bd0f762a98c79ba...)
  • 2026-05-25 21:50:16 UTC review queued 83ed0d887d13 (queued)
  • 2026-05-25 22:10:48 UTC review queued 9b6f38b4d141 (after repair)

@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 19, 2026
@clawsweeper

clawsweeper Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor Author

ClawSweeper PR egg

✨ Hatched: 🌱 uncommon Tiny Patch Peep

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🌱 uncommon.
Trait: stacks clean commits.
Image traits: location status garden; accessory review stamp; palette seafoam, black, and opal; mood focused; pose guarding a tiny green check; shell woven fiber shell; lighting soft underwater shimmer; background small review tokens.
Share on X: post this hatch
Copy: My PR egg hatched a 🌱 uncommon Tiny Patch Peep in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. clawsweeper:human-review Needs maintainer review before ClawSweeper can continue and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. labels May 20, 2026
@clawsweeper

clawsweeper Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor Author

🦞✅
ClawSweeper is pausing this repair loop for human review.

Source: clawsweeper[bot]
Reason: Review did not complete, so no work-lane recommendation was made. (sha=c1b31439982f27946cb1715542dcbbcbee658785)

Why human review is needed:
ClawSweeper found a blocker that should be resolved or accepted by a maintainer before the repair or automerge loop continues.

Recommended next action:
Review the reason above, resolve the blocker or explicitly accept the risk, then ask ClawSweeper to continue if automation is still appropriate.

I added clawsweeper:human-review and left the final call with a maintainer.

@Takhoffman

Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@yetval

yetval commented May 21, 2026

Copy link
Copy Markdown
Contributor

@clawsweeper re-review

@Takhoffman

Copy link
Copy Markdown
Contributor

@clawsweeper approve

@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor Author

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

@clawsweeper clawsweeper Bot removed the clawsweeper:human-review Needs maintainer review before ClawSweeper can continue label May 25, 2026
@clawsweeper clawsweeper Bot force-pushed the clawsweeper/automerge-openclaw-openclaw-84220 branch from c1b3143 to 83ed0d8 Compare May 25, 2026 21:41
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 25, 2026
yetval and others added 5 commits May 25, 2026 22:10
Pi auto-compaction wraps `session.compact()` under the session JSONL
write lock via `installSessionExternalHookWriteLock`. When the compact()
call hangs (post-run, after the outer abort/timeout has already fired),
the reacquired lock is held by an `await run()` body that never settles,
so the finally block that releases it never executes either.

Until the session-write-lock watchdog forces a release at `maxHoldMs`
(default runTimeout + compaction grace = many minutes), every later turn
on the same session bounces off `SessionWriteLockTimeoutError` at the
60s acquire timeout. The only observed recovery was a Gateway restart.

Track each reacquired write lock and abandon any still-held entries
when `acquireForCleanup` runs. The transcript fence already guards
against partial-write divergence on the next acquire, and the existing
finally path stays correct (release is idempotent via the entry flag).

Refs #84193
Add real-file-system integration test that exercises the actual
acquireSessionWriteLock + on-disk .jsonl.lock semantics (no mocks):

  - Hold the lock through a stuck withSessionWriteLock and verify a
    competing acquire on the same session file fails with
    SessionWriteLockTimeoutError (matches the user-reported 60s
    timeout on later Discord turns).
  - Run attempt cleanup, verify the lock is released, the next
    competing acquire succeeds in <5ms, and the .jsonl bytes remain
    intact (no transcript tear).
  - Verify the new stderr diagnostic line names the abandoned owner
    pid and session file — what maintainers see in
    `journalctl -u openclaw` after the bug fires.

Make abandonInFlightWriteLocks await its releases so the on-disk lock
file is gone before cleanup returns; without it, concurrent acquires
race the release and bounce off "file lock stale" until the watchdog
catches up.
@clawsweeper clawsweeper Bot force-pushed the clawsweeper/automerge-openclaw-openclaw-84220 branch from 83ed0d8 to 9b6f38b Compare May 25, 2026 22:10
@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor Author

ClawSweeper applied the proposed close for this PR.

@clawsweeper clawsweeper Bot closed this May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge clawsweeper Tracked by ClawSweeper automation merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Auto-compaction leaves session JSONL write lock held after timeout, blocking all later Discord turns

2 participants