Skip to content

Commit 25db525

Browse files
committed
fix(agents): reclaim session write-locks held past the holder's own maxHoldMs
1 parent 7c1a83f commit 25db525

4 files changed

Lines changed: 38 additions & 4 deletions

File tree

docs/reference/session-management-compaction.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -98,8 +98,11 @@ Transcript mutations use a session write lock on the transcript file. Lock acqui
9898
`session.writeLock.acquireTimeoutMs` before surfacing a busy-session error; the default is `60000`
9999
ms. Raise this only when legitimate prep, cleanup, compaction, or transcript mirror work contends
100100
longer on slow machines. `session.writeLock.staleMs` controls when an existing lock can be
101-
reclaimed as stale; the default is `1800000` ms. `session.writeLock.maxHoldMs` controls the
102-
in-process watchdog release threshold; the default is `300000` ms. Emergency env overrides are
101+
reclaimed as stale; the default is `1800000` ms. `session.writeLock.maxHoldMs` is the hard hold
102+
limit: the holder's own in-process watchdog releases the lock at this threshold, and a contending
103+
writer may also reclaim a lock held past this deadline (when the lock file is unchanged) so a
104+
wedged holder whose watchdog cannot fire does not pin the session; the default is `300000` ms.
105+
Emergency env overrides are
103106
`OPENCLAW_SESSION_WRITE_LOCK_ACQUIRE_TIMEOUT_MS`, `OPENCLAW_SESSION_WRITE_LOCK_STALE_MS`, and
104107
`OPENCLAW_SESSION_WRITE_LOCK_MAX_HOLD_MS`.
105108

src/agents/session-write-lock.test.ts

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -467,6 +467,34 @@ describe("acquireSessionWriteLock", () => {
467467
});
468468
});
469469

470+
it("reclaims a live OpenClaw-owned lock that exceeded its own maxHoldMs", async () => {
471+
await withTempSessionLockFile(async ({ sessionFile, lockPath }) => {
472+
const owner = spawn(process.execPath, ["-e", "setInterval(() => {}, 1000)", "openclaw"], {
473+
stdio: "ignore",
474+
});
475+
if (!owner.pid) {
476+
throw new Error("missing lock owner pid");
477+
}
478+
// Live OpenClaw owner, within staleMs but past its own recorded maxHoldMs: a stuck
479+
// holder whose in-process watchdog can never fire must still be reclaimable. (#87483)
480+
await fs.writeFile(
481+
lockPath,
482+
JSON.stringify({
483+
pid: owner.pid,
484+
createdAt: new Date(Date.now() - 30_000).toISOString(),
485+
maxHoldMs: 1_000,
486+
}),
487+
"utf8",
488+
);
489+
490+
try {
491+
await expectCurrentPidOwnsLock({ sessionFile, timeoutMs: 500, staleMs: 600_000 });
492+
} finally {
493+
owner.kill("SIGTERM");
494+
}
495+
});
496+
});
497+
470498
it("retries when a stale lock report disappears before diagnostics", async () => {
471499
await withTempSessionLockFile(async ({ sessionFile, lockPath }) => {
472500
const owner = spawn(process.execPath, ["-e", "setInterval(() => {}, 1000)", "openclaw"], {

src/agents/session-write-lock.ts

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,10 @@ export const DEFAULT_SESSION_WRITE_LOCK_MAX_HOLD_MS = 5 * 60 * 1000;
4646
export const DEFAULT_SESSION_WRITE_LOCK_ACQUIRE_TIMEOUT_MS = 60_000;
4747
const DEFAULT_WATCHDOG_INTERVAL_MS = 60_000;
4848
const DEFAULT_TIMEOUT_GRACE_MS = 2 * 60 * 1000;
49-
const REPORT_ONLY_STALE_LOCK_REASONS = new Set(["too-old", "hold-exceeded"]);
49+
// "too-old" (past global staleMs) stays report-only — a live holder may still be within its own
50+
// maxHoldMs. "hold-exceeded" (past the holder's OWN recorded maxHoldMs) is overdue by contract and
51+
// reclaimable; acquire's remove-if-unchanged still skips a lock whose file changed (e.g. a release). (#87483)
52+
const REPORT_ONLY_STALE_LOCK_REASONS = new Set(["too-old"]);
5053

5154
/**
5255
* Yield control to the event loop so other sessions can make progress

src/config/schema.help.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1651,7 +1651,7 @@ export const FIELD_HELP: Record<string, string> = {
16511651
"session.writeLock.staleMs":
16521652
"Milliseconds before an existing session transcript lock can be treated as stale and reclaimed. Default: 1800000; env override: OPENCLAW_SESSION_WRITE_LOCK_STALE_MS.",
16531653
"session.writeLock.maxHoldMs":
1654-
"Milliseconds a held in-process session transcript lock may remain held before the watchdog releases it. Default: 300000; env override: OPENCLAW_SESSION_WRITE_LOCK_MAX_HOLD_MS.",
1654+
"Milliseconds a held session transcript lock may remain held before it is reclaimed: the holder's own in-process watchdog releases it, and a contending writer may also reclaim it once this deadline passes if the lock file is unchanged. Default: 300000; env override: OPENCLAW_SESSION_WRITE_LOCK_MAX_HOLD_MS.",
16551655
"session.agentToAgent":
16561656
"Groups controls for inter-agent session exchanges, including loop prevention limits on reply chaining. Keep defaults unless you run advanced agent-to-agent automation with strict turn caps.",
16571657
"session.agentToAgent.maxPingPongTurns":

0 commit comments

Comments
 (0)