Skip to content

Commit 8a060b2

Browse files
Release embedded session write lock before model I/O (#82891)
Summary: - The PR narrows embedded PI session transcript write-lock scope, adds stale/max-hold config plumbing, and updates affected transcript, doctor, gateway, SDK, Codex mirroring, docs, and regression-test surfaces. - Reproducibility: yes. Current main source still holds the embedded session write lock from early attempt set ... cksmith Testbox contention proof on unmodified main; I did not rerun the live repro in this read-only pass. Automerge notes: - PR branch already contained follow-up commit before automerge: fix(agents): narrow context engine session lock - PR branch already contained follow-up commit before automerge: fix session lock runner build types - PR branch already contained follow-up commit before automerge: Release embedded session write lock before model I/O - PR branch already contained follow-up commit before automerge: fix(clawsweeper): address review for automerge-openclaw-openclaw-8289… Validation: - ClawSweeper review passed for head 4c6dd7e. - Required merge gates passed before the squash merge. Prepared head SHA: 4c6dd7e Review: #82891 (comment) Co-authored-by: Alex Knight <15041791+amknight@users.noreply.github.com> Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
1 parent 3dd8bcb commit 8a060b2

35 files changed

Lines changed: 1762 additions & 290 deletions

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -227,6 +227,7 @@ Docs: https://docs.openclaw.ai
227227
- Update/installers: override npm `min-release-age` quarantine for OpenClaw-managed package installs, so `openclaw update`, plugin updates, and hosted installer scripts can install the requested latest release immediately.
228228
- Agents/sessions: preserve fresh post-compaction token snapshots across stale usage updates, preventing repeated auto-compaction after every message. Fixes #82576. (#82578) Thanks @njuboy11.
229229
- Agents/replies: preserve active inbound reply context at the LLM boundary so Discord referenced-message turns do not answer from stale session history. Fixes #82608. (#82801) Thanks @joshavant.
230+
- Agents/sessions: expose session transcript lock stale and max-hold tuning, and release the embedded run's coarse transcript lock before model I/O while locking persistence and cleanup separately. Fixes #13744. Thanks @amknight.
230231
- Agents/OpenAI Responses: log redacted diagnostics for detail-less `response.failed` events while preserving failed response ids, so operators can correlate provider-side failures. Fixes #82558.
231232
- Agents/OpenRouter: strip non-replayable Anthropic/xAI reasoning provenance tags from follow-up requests, preventing poisoned thinking signatures from breaking second turns. Fixes #82335. (#82380) Thanks @hclsys.
232233
- Providers/xAI: send configurable reasoning effort only for Grok 4.3, preserving xAI's default low reasoning while omitting unsupported controls for Grok 4.20 reasoning models. (#81227) Thanks @jason-allen-oneal.
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
1a4ff6c148f4c28eb2c07c77025c6ba13ed9f56d23bbb221fc6dd83781fda671 config-baseline.json
2-
a2663c4aed132ae968e8e6ef84566d22063143f8b093e839e1063393135842f5 config-baseline.core.json
1+
4b52f0bff12148f4695150a45c91d4b9bda2d1bfbc1162a79a2bb2cf62c3c1eb config-baseline.json
2+
73e11d9d5c5b27d8d075202f59b9f19537ded361ea761ed0aef78dc9446bc82f config-baseline.core.json
33
fe4f1cb00d7d1dee9746779ec3cf14236e5f672c91502268a12ad6e467a2c4ad config-baseline.channel.json
44
e9049ce0154f484f44bb0ac174a44198269256044da5ba62a6e107e78bfd7a70 config-baseline.plugin.json

docs/reference/session-management-compaction.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,11 @@ OpenClaw no longer creates automatic `sessions.json.bak.*` rotation backups duri
9797
Transcript mutations use a session write lock on the transcript file. Lock acquisition waits up to
9898
`session.writeLock.acquireTimeoutMs` before surfacing a busy-session error; the default is `60000`
9999
ms. Raise this only when legitimate prep, cleanup, compaction, or transcript mirror work contends
100-
longer on slow machines. Stale-lock detection and maximum hold warnings remain separate policies.
100+
longer on slow machines. `session.writeLock.staleMs` controls when an existing lock can be
101+
reclaimed as stale; the default is `1800000` ms. `session.writeLock.maxHoldMs` controls the
102+
in-process watchdog release threshold; the default is `300000` ms. Emergency env overrides are
103+
`OPENCLAW_SESSION_WRITE_LOCK_ACQUIRE_TIMEOUT_MS`, `OPENCLAW_SESSION_WRITE_LOCK_STALE_MS`, and
104+
`OPENCLAW_SESSION_WRITE_LOCK_MAX_HOLD_MS`.
101105

102106
Enforcement order for disk budget cleanup (`mode: "enforce"`):
103107

extensions/codex/src/app-server/transcript-mirror.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ import {
44
acquireSessionWriteLock,
55
appendSessionTranscriptMessage,
66
emitSessionTranscriptUpdate,
7-
resolveSessionWriteLockAcquireTimeoutMs,
7+
resolveSessionWriteLockOptions,
88
runAgentHarnessBeforeMessageWriteHook,
99
type AgentMessage,
1010
type EmbeddedRunAttemptParams,
@@ -128,7 +128,7 @@ export async function mirrorCodexAppServerTranscript(params: {
128128

129129
const lock = await acquireSessionWriteLock({
130130
sessionFile: params.sessionFile,
131-
timeoutMs: resolveSessionWriteLockAcquireTimeoutMs(params.config),
131+
...resolveSessionWriteLockOptions(params.config),
132132
});
133133
try {
134134
const existingIdempotencyKeys = await readTranscriptIdempotencyKeys(params.sessionFile);

src/agents/command/attempt-execution.ts

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,7 @@ import { isCliProvider } from "../model-selection.js";
3030
import { resolveOpenAIRuntimeProviderForPi } from "../openai-codex-routing.js";
3131
import { runEmbeddedPiAgent, type EmbeddedPiRunResult } from "../pi-embedded.js";
3232
import { buildAgentRuntimeAuthPlan } from "../runtime-plan/auth.js";
33-
import {
34-
acquireSessionWriteLock,
35-
resolveSessionWriteLockAcquireTimeoutMs,
36-
} from "../session-write-lock.js";
33+
import { acquireSessionWriteLock, resolveSessionWriteLockOptions } from "../session-write-lock.js";
3734
import { buildWorkspaceSkillSnapshot } from "../skills.js";
3835
import { buildUsageWithNoCost } from "../stream-message-shared.js";
3936
import {
@@ -228,7 +225,7 @@ async function persistTextTurnTranscript(
228225
});
229226
const lock = await acquireSessionWriteLock({
230227
sessionFile,
231-
timeoutMs: resolveSessionWriteLockAcquireTimeoutMs(params.config),
228+
...resolveSessionWriteLockOptions(params.config),
232229
allowReentrant: true,
233230
});
234231
try {

src/agents/pi-embedded-runner/compact.hooks.harness.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -461,6 +461,11 @@ export async function loadCompactHooksHarness(): Promise<{
461461
acquireSessionWriteLock: vi.fn(async () => ({ release: vi.fn(async () => {}) })),
462462
resolveSessionLockMaxHoldFromTimeout: vi.fn(() => 0),
463463
resolveSessionWriteLockAcquireTimeoutMs: vi.fn(() => 60_000),
464+
resolveSessionWriteLockOptions: vi.fn(() => ({
465+
timeoutMs: 60_000,
466+
staleMs: 1_800_000,
467+
maxHoldMs: 300_000,
468+
})),
464469
}));
465470

466471
vi.doMock("../../context-engine/init.js", () => ({

src/agents/pi-embedded-runner/compact.ts

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ import { sanitizeToolUseResultPairing } from "../session-transcript-repair.js";
9999
import {
100100
acquireSessionWriteLock,
101101
resolveSessionLockMaxHoldFromTimeout,
102-
resolveSessionWriteLockAcquireTimeoutMs,
102+
resolveSessionWriteLockOptions,
103103
} from "../session-write-lock.js";
104104
import { detectRuntimeShell } from "../shell-utils.js";
105105
import {
@@ -956,9 +956,10 @@ async function compactEmbeddedPiSessionDirectOnce(
956956
const compactionTimeoutMs = resolveCompactionTimeoutMs(params.config);
957957
const sessionLock = await acquireSessionWriteLock({
958958
sessionFile: params.sessionFile,
959-
timeoutMs: resolveSessionWriteLockAcquireTimeoutMs(params.config),
960-
maxHoldMs: resolveSessionLockMaxHoldFromTimeout({
961-
timeoutMs: compactionTimeoutMs,
959+
...resolveSessionWriteLockOptions(params.config, {
960+
maxHoldMsFallback: resolveSessionLockMaxHoldFromTimeout({
961+
timeoutMs: compactionTimeoutMs,
962+
}),
962963
}),
963964
});
964965
try {

src/agents/pi-embedded-runner/context-engine-maintenance.test.ts

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,50 @@ describe("buildContextEngineMaintenanceRuntimeContext", () => {
182182
expect(rewriteTranscriptEntriesInSessionFileMock).not.toHaveBeenCalled();
183183
});
184184

185+
it("wraps active session manager rewrites in the supplied lock", async () => {
186+
const events: string[] = [];
187+
const sessionManager = { appendMessage: vi.fn() } as unknown as Parameters<
188+
typeof buildContextEngineMaintenanceRuntimeContext
189+
>[0]["sessionManager"];
190+
rewriteTranscriptEntriesInSessionManagerMock.mockImplementationOnce((_params?: unknown) => {
191+
events.push("rewrite");
192+
return {
193+
changed: true,
194+
bytesFreed: 77,
195+
rewrittenEntries: 1,
196+
};
197+
});
198+
const runtimeContext = buildContextEngineMaintenanceRuntimeContext({
199+
sessionId: "session-1",
200+
sessionKey: "agent:main:session-1",
201+
sessionFile: "/tmp/session.jsonl",
202+
sessionManager,
203+
withSessionManagerRewriteLock: async (operation) => {
204+
events.push("lock-start");
205+
try {
206+
return await operation();
207+
} finally {
208+
events.push("lock-end");
209+
}
210+
},
211+
});
212+
213+
await runtimeContext.rewriteTranscriptEntries?.({
214+
replacements: [
215+
{ entryId: "entry-1", message: { role: "user", content: "hi", timestamp: 1 } },
216+
],
217+
});
218+
219+
expect(events).toEqual(["lock-start", "rewrite", "lock-end"]);
220+
expect(rewriteTranscriptEntriesInSessionManagerMock).toHaveBeenCalledWith({
221+
sessionManager,
222+
replacements: [
223+
{ entryId: "entry-1", message: { role: "user", content: "hi", timestamp: 1 } },
224+
],
225+
});
226+
expect(rewriteTranscriptEntriesInSessionFileMock).not.toHaveBeenCalled();
227+
});
228+
185229
it("defers file rewrites onto the session lane when requested", async () => {
186230
vi.useFakeTimers();
187231
try {
@@ -419,6 +463,69 @@ describe("runContextEngineMaintenance", () => {
419463
});
420464
});
421465

466+
it("locks foreground maintenance rewrites that use the active session manager", async () => {
467+
const events: string[] = [];
468+
const maintain = vi.fn(async (params?: unknown) => {
469+
events.push("maintain-start");
470+
await (
471+
params as { runtimeContext?: ContextEngineRuntimeContext } | undefined
472+
)?.runtimeContext?.rewriteTranscriptEntries?.({
473+
replacements: [
474+
{ entryId: "entry-1", message: { role: "user", content: "hi", timestamp: 1 } },
475+
],
476+
});
477+
events.push("maintain-end");
478+
return {
479+
changed: false,
480+
bytesFreed: 0,
481+
rewrittenEntries: 0,
482+
};
483+
});
484+
const sessionManager = { appendMessage: vi.fn() } as unknown as Parameters<
485+
typeof buildContextEngineMaintenanceRuntimeContext
486+
>[0]["sessionManager"];
487+
rewriteTranscriptEntriesInSessionManagerMock.mockImplementationOnce((_params?: unknown) => {
488+
events.push("rewrite");
489+
return {
490+
changed: true,
491+
bytesFreed: 77,
492+
rewrittenEntries: 1,
493+
};
494+
});
495+
496+
await runContextEngineMaintenance({
497+
contextEngine: {
498+
info: { id: "test", name: "Test Engine" },
499+
ingest: async () => ({ ingested: true }),
500+
assemble: async ({ messages }) => ({ messages, estimatedTokens: 0 }),
501+
compact: async () => ({ ok: true, compacted: false }),
502+
maintain,
503+
},
504+
sessionId: "session-foreground-manager-rewrite",
505+
sessionKey: "agent:main:session-foreground-manager-rewrite",
506+
sessionFile: "/tmp/session-foreground-manager-rewrite.jsonl",
507+
reason: "turn",
508+
sessionManager,
509+
withSessionManagerRewriteLock: async (operation) => {
510+
events.push("lock-start");
511+
try {
512+
return await operation();
513+
} finally {
514+
events.push("lock-end");
515+
}
516+
},
517+
});
518+
519+
expect(events).toEqual(["maintain-start", "lock-start", "rewrite", "lock-end", "maintain-end"]);
520+
expect(rewriteTranscriptEntriesInSessionManagerMock).toHaveBeenCalledWith({
521+
sessionManager,
522+
replacements: [
523+
{ entryId: "entry-1", message: { role: "user", content: "hi", timestamp: 1 } },
524+
],
525+
});
526+
expect(rewriteTranscriptEntriesInSessionFileMock).not.toHaveBeenCalled();
527+
});
528+
422529
it("defers turn maintenance to a hidden background task when enabled", async () => {
423530
await withStateDirEnv("openclaw-turn-maintenance-", async () => {
424531
vi.useFakeTimers();

src/agents/pi-embedded-runner/context-engine-maintenance.ts

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,8 @@ type DeferredTurnMaintenanceRunState = {
6060

6161
const activeDeferredTurnMaintenanceRuns = new Map<string, DeferredTurnMaintenanceRunState>();
6262

63+
type SessionManagerRewriteLock = <T>(operation: () => Promise<T> | T) => Promise<T>;
64+
6365
type DeferredTurnMaintenanceSignal = "SIGINT" | "SIGTERM";
6466
type DeferredTurnMaintenanceProcessLike = Pick<NodeJS.Process, "on" | "off"> &
6567
Partial<Pick<NodeJS.Process, "listenerCount" | "kill" | "pid">> & {
@@ -277,6 +279,7 @@ export function buildContextEngineMaintenanceRuntimeContext(params: {
277279
sessionKey?: string;
278280
sessionFile: string;
279281
sessionManager?: Parameters<typeof rewriteTranscriptEntriesInSessionManager>[0]["sessionManager"];
282+
withSessionManagerRewriteLock?: SessionManagerRewriteLock;
280283
runtimeContext?: ContextEngineRuntimeContext;
281284
agentId?: string;
282285
allowDeferredCompactionExecution?: boolean;
@@ -297,10 +300,15 @@ export function buildContextEngineMaintenanceRuntimeContext(params: {
297300
...(params.allowDeferredCompactionExecution ? { allowDeferredCompactionExecution: true } : {}),
298301
rewriteTranscriptEntries: async (request) => {
299302
if (params.sessionManager) {
300-
return rewriteTranscriptEntriesInSessionManager({
301-
sessionManager: params.sessionManager,
302-
replacements: request.replacements,
303-
});
303+
const sessionManager = params.sessionManager;
304+
const rewriteSessionManagerEntries = () =>
305+
rewriteTranscriptEntriesInSessionManager({
306+
sessionManager,
307+
replacements: request.replacements,
308+
});
309+
return params.withSessionManagerRewriteLock
310+
? await params.withSessionManagerRewriteLock(rewriteSessionManagerEntries)
311+
: rewriteSessionManagerEntries();
304312
}
305313
const rewriteTranscriptEntriesInFile = async () =>
306314
await rewriteTranscriptEntriesInSessionFile({
@@ -329,6 +337,7 @@ async function executeContextEngineMaintenance(params: {
329337
sessionFile: string;
330338
reason: "bootstrap" | "compaction" | "turn";
331339
sessionManager?: Parameters<typeof rewriteTranscriptEntriesInSessionManager>[0]["sessionManager"];
340+
withSessionManagerRewriteLock?: SessionManagerRewriteLock;
332341
runtimeContext?: ContextEngineRuntimeContext;
333342
agentId?: string;
334343
executionMode: "foreground" | "background";
@@ -346,6 +355,8 @@ async function executeContextEngineMaintenance(params: {
346355
sessionKey: params.sessionKey,
347356
sessionFile: params.sessionFile,
348357
sessionManager: params.executionMode === "background" ? undefined : params.sessionManager,
358+
withSessionManagerRewriteLock:
359+
params.executionMode === "background" ? undefined : params.withSessionManagerRewriteLock,
349360
runtimeContext: params.runtimeContext,
350361
agentId: params.agentId,
351362
allowDeferredCompactionExecution: params.executionMode === "background",
@@ -636,6 +647,7 @@ export async function runContextEngineMaintenance(params: {
636647
sessionFile: string;
637648
reason: "bootstrap" | "compaction" | "turn";
638649
sessionManager?: Parameters<typeof rewriteTranscriptEntriesInSessionManager>[0]["sessionManager"];
650+
withSessionManagerRewriteLock?: SessionManagerRewriteLock;
639651
runtimeContext?: ContextEngineRuntimeContext;
640652
agentId?: string;
641653
executionMode?: "foreground" | "background";
@@ -681,6 +693,7 @@ export async function runContextEngineMaintenance(params: {
681693
sessionFile: params.sessionFile,
682694
reason: params.reason,
683695
sessionManager: params.sessionManager,
696+
withSessionManagerRewriteLock: params.withSessionManagerRewriteLock,
684697
runtimeContext: params.runtimeContext,
685698
agentId: params.agentId,
686699
executionMode,

0 commit comments

Comments
 (0)