fix(agents): retry stale session lock reports#87342
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 30, 2026, 11:14 PM ET / 03:14 UTC. Summary PR surface: Source +55, Tests +68. Total +123 across 2 files. Reproducibility: yes. Current main source plus the fs-safe sidecar-lock contract show Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the focused retry behavior after maintainer review confirms sibling PR ordering and focused session-lock validation passes on the exact merge result. Do we have a high-confidence way to reproduce the issue? Yes. Current main source plus the fs-safe sidecar-lock contract show Is this the best way to solve the issue? Yes. Keeping stale-lock retry and timeout normalization inside AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 4dad7bd93b6c. Label changesLabel justifications:
Evidence reviewedPR surface: Source +55, Tests +68. Total +123 across 2 files. View PR surface stats
Acceptance criteria:
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
80d61ba to
87a2980
Compare
|
Refreshed #87342 onto current Focused WSL validation after the refresh:
This should replace the old broad CI failures from the earlier head. Remaining ClawSweeper ask is real agent/tool runtime proof for the stale-lock stderr leak. No merge performed. |
This comment was marked as spam.
This comment was marked as spam.
|
@clawsweeper re-review Retrying after the prior ClawSweeper/Codex review failed before summarizing the patch. Branch is mergeable at head 87a2980; no code changes in this comment. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
87a2980 to
0488f01
Compare
|
@clawsweeper re-review Updated #87342 with a current-main rebase and added real runtime proof for the stale-lock stderr ask. Branch/head under test: Rebase/coordination:
Real runtime proof: The proof harness exercises the real agent command transcript persistence path (
Result: {
"status": "pass",
"branchHead": "0488f01bec84bd61bc9f1c8cac13671644ac3b05",
"runtimePath": "persistCliTurnTranscript",
"resolverCalls": 5,
"stderrContainsFileLockStale": false,
"transcriptAppended": true,
"lockReleased": true,
"sessionFileSuffix": "proof.jsonl",
"sessionEntryHasFile": true
}Focused validation on rebased head: |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review Updated #87342 for the ClawSweeper P2 finding about stale-lock retries resetting the acquire timeout budget. Changes on PR head
Validation in WSL from
No public config, plugin API, CLI flag, env var, migration, or plugin contract surface was added. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
871d479 to
2696c8b
Compare
|
Updated #87342 onto current New signed head: What changed in this refresh:
Focused validation:
No merge performed. @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Thanks for working through this and for keeping the branch refreshed. We landed the stale-lock policy in #88658 instead, at commit 7ca7712. The important maintainer decision is that a live OpenClaw-owned stale lock should remain visible as a typed stale lock acquisition failure, with owner diagnostics, rather than being converted into the existing timeout path. Dead/orphaned/recycled locks are still safely reclaimable, but live-owned stale locks are not silently removed or hidden behind timeout-style retry because that can obscure the writer contention we need to diagnose. So I am closing this as superseded by #88658, not because the report was invalid. The raw stale-lock leak path is now handled by the typed |
Summary
Fixes #87340.
acquireSessionWriteLock()could surface the rawfile_lock_staleerror from the sidecar lock manager when a stale-lock observation raced with recovery. In agent/tool execution that raw lock-manager message can look like actionable tool stderr even though the correct session-write behavior is to keep treating it as contention until the configured acquire timeout expires.This change keeps stale lock reports inside the session write-lock policy: retry briefly within the existing acquire timeout, and if the lock remains unavailable, raise the existing
SessionWriteLockTimeoutErrorshape instead of leakingfile lock stale for .... The latest refresh preserves current-main lock cleanup behavior and the PR's stale-lock retry/timeout-budget behavior.No config surface or plugin surface changes.
Real behavior proof
Behavior addressed: recovered or racey stale session write-lock reports no longer surface as raw
file_lock_stale/file lock stale for ...errors during session journal acquisition.Real environment tested: local OpenClaw Linux/WSL source worktree rebased onto current
main(4dad7bd93b6caae036342fd4efbdb47c217b459f).Exact steps or command run after this patch:
Evidence after fix:
Observed result after fix: the stale-lock retry path keeps one acquire timeout budget, retries stale reports through session write-lock policy, and converts exhausted stale/timeout paths to the existing session lock timeout error shape.
Rebased signed head:
2696c8b89f94fa132a00fad3562b130db27473c8.