fix(agents): report stale session locks without cleanup#88658
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed May 31, 2026, 10:28 AM ET / 14:28 UTC. Summary PR surface: Source +106, Tests +98. Total +204 across 10 files. Reproducibility: yes. source-reproducible: current main’s acquire path treats stale live OpenClaw locks as reclaimable and passes them to fs-safe’s removal path. I did not execute tests because this review is read-only. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Land the fail-closed typed stale-lock path only after maintainer approval, operator help/docs alignment, and live or packaged proof for session send/spawn flows. Do we have a high-confidence way to reproduce the issue? Yes, source-reproducible: current main’s acquire path treats stale live OpenClaw locks as reclaimable and passes them to fs-safe’s removal path. I did not execute tests because this review is read-only. Is this the best way to solve the issue? Unclear: the code-level split is narrow and safer for lock integrity, but the docs/proof/product-approval gap means it is not merge-ready yet. Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 88c99ddf5f82. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +106, Tests +98. Total +204 across 10 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
Landing proof for 840cc2d: Behavior addressed: stale session locks owned by a live OpenClaw process are reported as typed lock-acquisition failures instead of being auto-deleted; safely reclaimable dead/orphan/recycled locks still clean up. Exact steps or command run after this patch:
Evidence after fix:
Known proof gap:
|
Follow-up to #88658. Retries transient stale session-lock acquire failures when diagnostics show the old stale report disappeared, was replaced by a fresh valid lock, or was replaced by a fresh payload-less lock still inside the mtime/orphan grace window. Preserves typed `SessionWriteLockStaleError` diagnostics for still-present live OpenClaw-owned stale locks. Proof: 53 focused session-write-lock tests passed locally and in the agents-core CI shard; `pnpm tsgo:test:src`, touched-file oxlint, `git diff --check`, and autoreview passed locally. CI run 26716843811 has unrelated failures in UI deadcode/types and bash-tools tests; session-write-lock tests passed in that run. Refs #87217.
Follow-up to openclaw#88658. Retries transient stale session-lock acquire failures when diagnostics show the old stale report disappeared, was replaced by a fresh valid lock, or was replaced by a fresh payload-less lock still inside the mtime/orphan grace window. Preserves typed `SessionWriteLockStaleError` diagnostics for still-present live OpenClaw-owned stale locks. Proof: 53 focused session-write-lock tests passed locally and in the agents-core CI shard; `pnpm tsgo:test:src`, touched-file oxlint, `git diff --check`, and autoreview passed locally. CI run 26716843811 has unrelated failures in UI deadcode/types and bash-tools tests; session-write-lock tests passed in that run. Refs openclaw#87217.
Follow-up to openclaw#88658. Retries transient stale session-lock acquire failures when diagnostics show the old stale report disappeared, was replaced by a fresh valid lock, or was replaced by a fresh payload-less lock still inside the mtime/orphan grace window. Preserves typed `SessionWriteLockStaleError` diagnostics for still-present live OpenClaw-owned stale locks. Proof: 53 focused session-write-lock tests passed locally and in the agents-core CI shard; `pnpm tsgo:test:src`, touched-file oxlint, `git diff --check`, and autoreview passed locally. CI run 26716843811 has unrelated failures in UI deadcode/types and bash-tools tests; session-write-lock tests passed in that run. Refs openclaw#87217.
Follow-up to openclaw#88658. Retries transient stale session-lock acquire failures when diagnostics show the old stale report disappeared, was replaced by a fresh valid lock, or was replaced by a fresh payload-less lock still inside the mtime/orphan grace window. Preserves typed `SessionWriteLockStaleError` diagnostics for still-present live OpenClaw-owned stale locks. Proof: 53 focused session-write-lock tests passed locally and in the agents-core CI shard; `pnpm tsgo:test:src`, touched-file oxlint, `git diff --check`, and autoreview passed locally. CI run 26716843811 has unrelated failures in UI deadcode/types and bash-tools tests; session-write-lock tests passed in that run. Refs openclaw#87217.
Summary
SessionWriteLockStaleErrorand route stale acquire failures through embedded-run takeover, failover suppression, prompt-cache propagation, announce delivery, and QA retry classifiers.Refs #87779
Verification
node scripts/run-vitest.mjs src/agents/session-write-lock.test.ts src/agents/embedded-agent-runner/run/attempt.session-lock.test.ts src/agents/failover-error.test.ts src/agents/subagent-announce-delivery.test.ts src/agents/embedded-agent-runner/google-prompt-cache.test.ts extensions/qa-lab/src/suite-runtime-agent-session.test.tspnpm tsgo:test:srcfailed on latestorigin/mainin unrelatedsrc/agents/acp-spawn.test.ts(984,15): Object literal may only specify known properties, and 'thinking' does not exist....agents/skills/autoreview/scripts/autoreview --mode localclean after fixing accepted QA retry findingReal behavior proof
Behavior addressed: stale session transcript locks owned by a live OpenClaw process are reported as typed stale-lock contention without deleting the lock; dead/orphan reclaim remains on the existing safe removal path.
Real environment tested: local macOS checkout on branch
fix/session-stale-lock-diagnosticsafter rebase ontoorigin/main.Exact steps or command run after this patch:
node scripts/run-vitest.mjs src/agents/session-write-lock.test.ts src/agents/embedded-agent-runner/run/attempt.session-lock.test.ts src/agents/failover-error.test.ts src/agents/subagent-announce-delivery.test.ts src/agents/embedded-agent-runner/google-prompt-cache.test.ts extensions/qa-lab/src/suite-runtime-agent-session.test.ts.Evidence after fix: 3 Vitest shards passed; 6 files passed; 269 tests passed.
Observed result after fix: live OpenClaw-owned stale lock test throws
SessionWriteLockStaleErrorand leaves the lock file in place; QA helper retries stale lock contention.What was not tested: live gateway multi-agent
sessions_send/sessions_spawnstress run; broadtsgo:test:srcis blocked by unrelated latest-main ACP spawn test type error noted above.