Skip to content

Commit 3000217

Browse files
author
Eva
committed
fix(sessions): bump fallback-guard default to 2mb + add boot-time guard + operator recovery prompt (#76940)
Three follow-ups to the initial guard addition based on operator feedback: 1) Default threshold 1mb → 2mb. 1MiB jsonl is roughly 250k tokens of message content; 2MiB is roughly 500k tokens. 500k tokens already overflows every shipping context window — for models in the 200-256k effective-window range it overflows much sooner. Operators on smaller-context models can still dial down via session.maintenance.contextFallbackGuard.sizeBytes. 2) Boot-time guard (applyContextEngineBootGuard). The on-fallback path only catches "configured engine failed to load." It misses the much more common case: no context engine was ever configured. The legacy engine windows the prompt in-memory at request time but never shrinks the on-disk jsonl, so an unmanaged session grows append-only until the gateway stalls on next start. The boot guard runs once at startup and applies the same policy when slots.contextEngine is unset/legacy or the configured plugin is missing from loadedPluginIds. Both triggers funnel into the same applyContextEngineFallbackGuard implementation; one config knob, one policy, two entry points. 3) Operator-facing message rewrite. The terse single-line warn/archive log is replaced with a structured block that names the file, the engine that failed, the size, the available repair commands (openclaw doctor --fix / sessions archive / config set slots), AND a copy-pasteable recovery prompt for the next agent turn. The prompt instructs the agent to read the archived tail (last ~200 non-system messages, group into chunks of 1-2k tokens each, stop at ~40k tokens aggregate), giving the fresh session enough context to continue meaningfully. Sized so we use the fresh session's available context window — not so miserly that the user loses their working state, not so generous that we eat the whole window. Tests: - 17 unit tests pass (12 original + 5 new for boot-guard / recovery prompt) - Existing 34 src/context-engine tests unchanged - Lint clean on changed files Wiring: - fallback-guard.ts: bump DEFAULT_FALLBACK_GUARD_SIZE_BYTES, add renderWarnMessage / renderArchiveMessage / renderRecoveryPrompt, add applyContextEngineBootGuard - server-startup-post-attach.ts: invoke boot guard right after logGatewayStartup; never let guard exceptions stall startup - CHANGELOG: expanded entry covering both trigger paths and threshold rationale Refs #76940.
1 parent 10b9df7 commit 3000217

4 files changed

Lines changed: 418 additions & 21 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Docs: https://docs.openclaw.ai
3030

3131
### Fixes
3232

33-
- Sessions/context-engine: add `session.maintenance.contextFallbackGuard` defensive guard that runs when a configured context-engine plugin (e.g. lossless-claw) fails to resolve and the gateway falls back to the default `legacy` engine. Walks the affected agent's transcript directory and, for any session jsonl exceeding `sizeBytes` (default `1mb`), applies the configured `action`: `warn`, `archive`, `block`, or `auto` (archive when an engine sqlite store is present, warn otherwise). Surfaces the real failure mode instead of letting next-load context overflow stall the gateway. Fixes #76940.
33+
- Sessions/context-engine: add `session.maintenance.contextFallbackGuard` defensive guard that runs when no working context engine is managing on-disk session transcripts. Walks the affected agent's transcript directory and, for any session jsonl exceeding `sizeBytes` (default `2mb` ≈ 500k tokens), applies the configured `action`: `warn`, `archive`, `block`, or `auto` (archive when an engine sqlite store is present, warn otherwise). Two trigger paths: (1) on-fallback inside `resolveContextEngine` when a configured engine fails to load, and (2) at gateway boot when `slots.contextEngine` is unset/`legacy` or the configured plugin is missing from the loaded set. The legacy engine windows the prompt in-memory but never shrinks the on-disk jsonl, so unmanaged sessions otherwise grow until they stall the gateway on next start. The operator-facing log includes a copy-pasteable recovery prompt that instructs the agent to read the archived tail (last ~200 messages, 1-2k token chunks, ~40k aggregate) and reload meaningful context into the fresh session. Fixes #76940.
3434
- Channels/streaming: normalize whitespace and case for `streaming.progress.label: "auto"` so progress draft labels keep using the built-in label pool instead of rendering a literal `auto` title. Thanks @vincentkoc.
3535
- Gateway/install: prefer supported system Node over nvm/fnm/volta/asdf/mise when regenerating managed gateway services, so `gateway install --force` no longer recreates service definitions that doctor immediately flags as version-manager-backed. Fixes #76339. Thanks @brokemac79.
3636
- Gateway/usage: serve `usage.cost` and `sessions.usage` from a durable transcript aggregate cache with lock-safe background refreshes and localized stale-cache status, so large usage views avoid repeated full scans. (#76650) Thanks @Marvinthebored.

src/context-engine/fallback-guard.test.ts

Lines changed: 141 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import path from "node:path";
44
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
55
import type { OpenClawConfig } from "../config/types.openclaw.js";
66
import {
7+
applyContextEngineBootGuard,
78
applyContextEngineFallbackGuard,
89
fallbackGuardOutcomeIsBlocking,
910
DEFAULT_FALLBACK_GUARD_SIZE_BYTES,
@@ -107,7 +108,7 @@ describe("applyContextEngineFallbackGuard", () => {
107108
const message = logger.warn.mock.calls[0]?.[0] ?? "";
108109
expect(message).toContain("session-size guard tripped");
109110
expect(message).toContain('engine="lossless-claw"');
110-
expect(message).toContain("action=warn");
111+
expect(message).toContain("Action: warn");
111112
});
112113

113114
it("archives when action=archive", () => {
@@ -200,7 +201,7 @@ describe("applyContextEngineFallbackGuard", () => {
200201
expect(outcome.triggered[0]?.path).toMatch(/session-live\.jsonl$/u);
201202
});
202203

203-
it("uses default 1MiB threshold when config does not override", () => {
204+
it("uses default 2MiB threshold when config does not override", () => {
204205
writeJsonl(tmpDir, "session-big.jsonl", DEFAULT_FALLBACK_GUARD_SIZE_BYTES + 1024);
205206
writeJsonl(tmpDir, "session-small.jsonl", 1024);
206207
const logger = makeLogger();
@@ -214,9 +215,147 @@ describe("applyContextEngineFallbackGuard", () => {
214215
warnedPaths: new Set(),
215216
});
216217
expect(outcome.resolvedSizeBytes).toBe(DEFAULT_FALLBACK_GUARD_SIZE_BYTES);
218+
expect(outcome.resolvedSizeBytes).toBe(2 * 1_048_576);
217219
expect(outcome.triggered).toHaveLength(1);
218220
});
219221

222+
it("warn message includes a copy-pasteable recovery prompt for the agent", () => {
223+
writeJsonl(tmpDir, "session-big.jsonl", 5 * 1024 * 1024);
224+
const logger = makeLogger();
225+
applyContextEngineFallbackGuard({
226+
config: configWith("warn", "1mb"),
227+
failedEngineId: "lossless-claw",
228+
fallbackReason: "engine not registered",
229+
logger,
230+
resolveSessionsDir: () => tmpDir,
231+
warnedPaths: new Set(),
232+
});
233+
const message = logger.warn.mock.calls[0]?.[0] ?? "";
234+
expect(message).toContain("Session-size guard");
235+
expect(message).toContain("Read the archived transcript at");
236+
expect(message).toContain("last ~200 non-system messages");
237+
expect(message).toContain("openclaw doctor --fix");
238+
expect(message).toContain("openclaw config set plugins.slots.contextEngine");
239+
});
240+
241+
it("archive message includes the archived path inside the recovery prompt", () => {
242+
writeJsonl(tmpDir, "session-big.jsonl", 5 * 1024 * 1024);
243+
const logger = makeLogger();
244+
const outcome = applyContextEngineFallbackGuard({
245+
config: configWith("archive", "1mb"),
246+
failedEngineId: "lossless-claw",
247+
fallbackReason: "engine not registered",
248+
logger,
249+
resolveSessionsDir: () => tmpDir,
250+
});
251+
const archived = outcome.triggered[0]?.archivedPath;
252+
expect(archived).toBeDefined();
253+
const message = logger.warn.mock.calls[0]?.[0] ?? "";
254+
expect(message).toContain("Session-size guard archived");
255+
expect(message).toContain(archived as string);
256+
expect(message).toContain("paste this prompt into the agent");
257+
});
258+
});
259+
260+
describe("applyContextEngineBootGuard", () => {
261+
let tmpDir: string;
262+
263+
beforeEach(() => {
264+
tmpDir = createTempSessionsDir();
265+
});
266+
267+
afterEach(() => {
268+
if (fs.existsSync(tmpDir)) {
269+
fs.rmSync(tmpDir, { recursive: true, force: true });
270+
}
271+
});
272+
273+
function bootConfig(slot: string | undefined): OpenClawConfig {
274+
return {
275+
plugins: { slots: slot === undefined ? {} : { contextEngine: slot } },
276+
session: {
277+
maintenance: { contextFallbackGuard: { action: "warn", sizeBytes: "1mb" } },
278+
},
279+
} as OpenClawConfig;
280+
}
281+
282+
it("returns null when configured engine is loaded at boot", () => {
283+
writeJsonl(tmpDir, "session-big.jsonl", 5 * 1024 * 1024);
284+
const outcome = applyContextEngineBootGuard({
285+
config: bootConfig("lossless-claw"),
286+
activeContextEngineId: "lossless-claw",
287+
loadedPluginIds: new Set(["lossless-claw"]),
288+
logger: makeLogger(),
289+
resolveSessionsDir: () => tmpDir,
290+
});
291+
expect(outcome).toBeNull();
292+
});
293+
294+
it("fires when slots.contextEngine is unset (default legacy)", () => {
295+
writeJsonl(tmpDir, "session-big.jsonl", 5 * 1024 * 1024);
296+
const logger = makeLogger();
297+
const outcome = applyContextEngineBootGuard({
298+
config: bootConfig(undefined),
299+
activeContextEngineId: undefined,
300+
loadedPluginIds: new Set(),
301+
logger,
302+
resolveSessionsDir: () => tmpDir,
303+
warnedPaths: new Set(),
304+
});
305+
expect(outcome).not.toBeNull();
306+
expect(outcome?.triggered).toHaveLength(1);
307+
const message = logger.warn.mock.calls[0]?.[0] ?? "";
308+
expect(message).toContain('engine="(legacy/none)"');
309+
expect(message).toContain("no context engine configured");
310+
});
311+
312+
it("fires when slots.contextEngine is 'legacy'", () => {
313+
writeJsonl(tmpDir, "session-big.jsonl", 5 * 1024 * 1024);
314+
const logger = makeLogger();
315+
const outcome = applyContextEngineBootGuard({
316+
config: bootConfig("legacy"),
317+
activeContextEngineId: "legacy",
318+
loadedPluginIds: new Set(),
319+
logger,
320+
resolveSessionsDir: () => tmpDir,
321+
warnedPaths: new Set(),
322+
});
323+
expect(outcome).not.toBeNull();
324+
expect(outcome?.triggered).toHaveLength(1);
325+
});
326+
327+
it("fires when configured engine is set but not loaded at boot", () => {
328+
writeJsonl(tmpDir, "session-big.jsonl", 5 * 1024 * 1024);
329+
const logger = makeLogger();
330+
const outcome = applyContextEngineBootGuard({
331+
config: bootConfig("lossless-claw"),
332+
activeContextEngineId: "lossless-claw",
333+
loadedPluginIds: new Set(["browser", "telegram", "cortex"]),
334+
logger,
335+
resolveSessionsDir: () => tmpDir,
336+
warnedPaths: new Set(),
337+
});
338+
expect(outcome).not.toBeNull();
339+
expect(outcome?.triggered).toHaveLength(1);
340+
const message = logger.warn.mock.calls[0]?.[0] ?? "";
341+
expect(message).toContain('engine="lossless-claw"');
342+
expect(message).toContain("did not load at gateway startup");
343+
});
344+
345+
it("returns null when no transcripts exceed threshold even with no engine", () => {
346+
writeJsonl(tmpDir, "session-small.jsonl", 1024);
347+
const outcome = applyContextEngineBootGuard({
348+
config: bootConfig(undefined),
349+
activeContextEngineId: undefined,
350+
loadedPluginIds: new Set(),
351+
logger: makeLogger(),
352+
resolveSessionsDir: () => tmpDir,
353+
warnedPaths: new Set(),
354+
});
355+
expect(outcome).not.toBeNull();
356+
expect(outcome?.triggered).toHaveLength(0);
357+
});
358+
220359
it("dedups warn messages for the same path within one process", () => {
221360
writeJsonl(tmpDir, "session-big.jsonl", 2 * 1024 * 1024);
222361
const logger = makeLogger();

0 commit comments

Comments
 (0)