Skip to content

Commit 4bc46cc

Browse files
authored
fix(gateway): cap compaction reserve floor to context window for small models (#65671)
Fixes #65465. Caps the compaction reserveTokensFloor so that at least min(8 000, 50%) of the context window remains available for prompt content, preventing the default 20 000-token floor from exceeding the entire context window on small-context local models (e.g. Ollama 16K). The cap is only applied when contextTokenBudget is provided, preserving backward compatibility.
1 parent 1169dd7 commit 4bc46cc

8 files changed

Lines changed: 221 additions & 3 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Docs: https://docs.openclaw.ai
1818
- Matrix/security: normalize sandboxed profile avatar params, preserve `mxc://` avatar URLs, and surface gmail watcher stop failures during reload. (#64701) Thanks @slepybear.
1919
- Telegram/documents: drop leaked binary caption bytes from inbound Telegram text handling so document uploads like `.mobi` or `.epub` no longer explode prompt token counts. (#66663) Thanks @joelnishanth.
2020
- Gateway/auth: resolve the active gateway bearer per-request on the HTTP server and the HTTP upgrade handler via `getResolvedAuth()`, mirroring the WebSocket path, so a secret rotated through `secrets.reload` or config hot-reload stops authenticating on `/v1/*`, `/tools/invoke`, plugin HTTP routes, and the canvas upgrade path immediately instead of remaining valid on HTTP until gateway restart. (#66651) Thanks @mmaps.
21+
- Agents/compaction: cap the compaction reserve-token floor to the model context window so small-context local models (e.g. Ollama with 16K tokens) no longer trigger context-overflow errors or infinite compaction loops on every prompt. (#65671) Thanks @openperf.
2122

2223
## 2026.4.14
2324

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
/**
2+
* Absolute minimum prompt budget in tokens. When the context window is
3+
* large enough that `contextTokenBudget * MIN_PROMPT_BUDGET_RATIO` exceeds
4+
* this value, this absolute floor takes precedence.
5+
*/
6+
export const MIN_PROMPT_BUDGET_TOKENS = 8_000;
7+
8+
/**
9+
* Minimum share of the context window that must remain available for prompt
10+
* content after reserve tokens are subtracted.
11+
*/
12+
export const MIN_PROMPT_BUDGET_RATIO = 0.5;

src/agents/pi-embedded-runner/compact.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -786,6 +786,7 @@ export async function compactEmbeddedPiSessionDirect(
786786
cwd: effectiveWorkspace,
787787
agentDir,
788788
cfg: params.config,
789+
contextTokenBudget: ctxInfo.tokens,
789790
});
790791
// Sets compaction/pruning runtime state and returns extension factories
791792
// that must be passed to the resource loader for the safeguard to be active.

src/agents/pi-embedded-runner/run/attempt.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -913,6 +913,7 @@ export async function runEmbeddedAttempt(
913913
cwd: effectiveWorkspace,
914914
agentDir,
915915
cfg: params.config,
916+
contextTokenBudget: params.contextTokenBudget,
916917
});
917918
applyPiAutoCompactionGuard({
918919
settingsManager,

src/agents/pi-embedded-runner/run/preemptive-compaction.ts

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,16 @@ import { estimateTokens } from "@mariozechner/pi-coding-agent";
33
import { SAFETY_MARGIN, estimateMessagesTokens } from "../../compaction.js";
44
import { estimateToolResultReductionPotential } from "../tool-result-truncation.js";
55
import type { PreemptiveCompactionRoute } from "./preemptive-compaction.types.js";
6+
import {
7+
MIN_PROMPT_BUDGET_RATIO,
8+
MIN_PROMPT_BUDGET_TOKENS,
9+
} from "../../pi-compaction-constants.js";
610

711
export const PREEMPTIVE_OVERFLOW_ERROR_TEXT =
812
"Context overflow: prompt too large for the model (precheck).";
913

1014
const ESTIMATED_CHARS_PER_TOKEN = 4;
1115
const TRUNCATION_ROUTE_BUFFER_TOKENS = 512;
12-
const MIN_PROMPT_BUDGET_TOKENS = 8_000;
13-
const MIN_PROMPT_BUDGET_RATIO = 0.5;
1416

1517
export type { PreemptiveCompactionRoute } from "./preemptive-compaction.types.js";
1618

src/agents/pi-project-settings.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -187,11 +187,14 @@ export function createPreparedEmbeddedPiSettingsManager(params: {
187187
cwd: string;
188188
agentDir: string;
189189
cfg?: OpenClawConfig;
190+
/** Resolved context window budget so reserve-token floor can be capped for small models. */
191+
contextTokenBudget?: number;
190192
}): SettingsManager {
191193
const settingsManager = createEmbeddedPiSettingsManager(params);
192194
applyPiCompactionSettingsFromConfig({
193195
settingsManager,
194196
cfg: params.cfg,
197+
contextTokenBudget: params.contextTokenBudget,
195198
});
196199
return settingsManager;
197200
}

src/agents/pi-settings.test.ts

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
import { describe, expect, it, vi } from "vitest";
2+
import {
3+
MIN_PROMPT_BUDGET_RATIO,
4+
MIN_PROMPT_BUDGET_TOKENS,
5+
} from "./pi-compaction-constants.js";
26
import {
37
applyPiCompactionSettingsFromConfig,
48
DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR,
@@ -120,6 +124,173 @@ describe("applyPiCompactionSettingsFromConfig", () => {
120124
expect(result.compaction.keepRecentTokens).toBe(20_000);
121125
expect(settingsManager.applyOverrides).not.toHaveBeenCalled();
122126
});
127+
128+
it("caps floor to context window ratio for small-context models", () => {
129+
// Pi SDK default reserveTokens is 16 384. With a 16 384 context window
130+
// the default floor (20 000) exceeds the window. The aligned cap
131+
// computes: minPromptBudget = min(8_000, floor(16_384 * 0.5)) = 8_000,
132+
// maxReserve = 16_384 - 8_000 = 8_384. Since current (16_384) > capped
133+
// floor (8_384), no override is needed.
134+
const settingsManager = {
135+
getCompactionReserveTokens: () => 16_384,
136+
getCompactionKeepRecentTokens: () => 20_000,
137+
applyOverrides: vi.fn(),
138+
};
139+
140+
const result = applyPiCompactionSettingsFromConfig({
141+
settingsManager,
142+
contextTokenBudget: 16_384,
143+
});
144+
145+
// Without the cap, reserveTokens would be bumped to 20_000.
146+
// With the cap, it stays at 16_384 (the current value).
147+
expect(result.compaction.reserveTokens).toBe(16_384);
148+
expect(result.compaction.reserveTokens).toBeLessThan(
149+
DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR,
150+
);
151+
expect(result.didOverride).toBe(false);
152+
expect(settingsManager.applyOverrides).not.toHaveBeenCalled();
153+
});
154+
155+
it("applies capped floor over user-configured reserveTokens when default floor exceeds context window", () => {
156+
const settingsManager = {
157+
getCompactionReserveTokens: () => 16_384,
158+
getCompactionKeepRecentTokens: () => 20_000,
159+
applyOverrides: vi.fn(),
160+
};
161+
162+
// User sets reserveTokens=2048 but NOT reserveTokensFloor (default 20_000 applies).
163+
// Pre-fix: target = max(2048, 20_000) = 20_000 → exceeds 16_384 context → infinite loop.
164+
// Post-fix: floor capped to 8_384 → target = max(2048, 8_384) = 8_384 → works.
165+
const result = applyPiCompactionSettingsFromConfig({
166+
settingsManager,
167+
cfg: {
168+
agents: {
169+
defaults: {
170+
compaction: { reserveTokens: 2_048 },
171+
},
172+
},
173+
},
174+
contextTokenBudget: 16_384,
175+
});
176+
177+
expect(result.didOverride).toBe(true);
178+
expect(result.compaction.reserveTokens).toBe(8_384); // capped floor wins over user's 2_048
179+
expect(settingsManager.applyOverrides).toHaveBeenCalledWith({
180+
compaction: { reserveTokens: 8_384 },
181+
});
182+
});
183+
184+
it("applies capped floor when current reserve is below it on small-context models", () => {
185+
// Simulate a Pi SDK default of 4 096 with a 16 384 context window.
186+
// minPromptBudget = min(8_000, floor(16_384 * 0.5)) = 8_000.
187+
// maxReserve = 16_384 - 8_000 = 8_384.
188+
// Capped floor = min(20_000, 8_384) = 8_384.
189+
// targetReserveTokens = max(4_096, 8_384) = 8_384 → override applied.
190+
const settingsManager = {
191+
getCompactionReserveTokens: () => 4_096,
192+
getCompactionKeepRecentTokens: () => 20_000,
193+
applyOverrides: vi.fn(),
194+
};
195+
196+
const result = applyPiCompactionSettingsFromConfig({
197+
settingsManager,
198+
contextTokenBudget: 16_384,
199+
});
200+
201+
const minPromptBudget = Math.min(
202+
MIN_PROMPT_BUDGET_TOKENS,
203+
Math.max(1, Math.floor(16_384 * MIN_PROMPT_BUDGET_RATIO)),
204+
);
205+
const expectedReserve = Math.max(0, 16_384 - minPromptBudget);
206+
expect(result.didOverride).toBe(true);
207+
expect(result.compaction.reserveTokens).toBe(expectedReserve);
208+
expect(settingsManager.applyOverrides).toHaveBeenCalledWith({
209+
compaction: { reserveTokens: expectedReserve },
210+
});
211+
});
212+
213+
it("respects user-configured reserveTokens below capped floor for small models", () => {
214+
const settingsManager = {
215+
getCompactionReserveTokens: () => 16_384,
216+
getCompactionKeepRecentTokens: () => 20_000,
217+
applyOverrides: vi.fn(),
218+
};
219+
220+
// User explicitly sets reserveTokens=2048 and reserveTokensFloor=0.
221+
// With contextTokenBudget=16384, the capped floor = min(0, 8192) = 0.
222+
// targetReserveTokens = max(2048, 0) = 2048.
223+
const result = applyPiCompactionSettingsFromConfig({
224+
settingsManager,
225+
cfg: {
226+
agents: {
227+
defaults: {
228+
compaction: { reserveTokens: 2_048, reserveTokensFloor: 0 },
229+
},
230+
},
231+
},
232+
contextTokenBudget: 16_384,
233+
});
234+
235+
expect(result.compaction.reserveTokens).toBe(2_048);
236+
expect(settingsManager.applyOverrides).toHaveBeenCalledWith({
237+
compaction: { reserveTokens: 2_048 },
238+
});
239+
});
240+
241+
it("does not cap floor for mid-size models when maxReserve exceeds default floor", () => {
242+
const settingsManager = {
243+
getCompactionReserveTokens: () => 16_384,
244+
getCompactionKeepRecentTokens: () => 20_000,
245+
applyOverrides: vi.fn(),
246+
};
247+
248+
// 32 768 context window → minPromptBudget = min(8_000, floor(32_768 * 0.5)) = 8_000.
249+
// maxReserve = 32_768 - 8_000 = 24_768.
250+
// Since 24_768 > 20_000 (DEFAULT_FLOOR), the floor is NOT capped and stays at 20_000.
251+
const result = applyPiCompactionSettingsFromConfig({
252+
settingsManager,
253+
contextTokenBudget: 32_768,
254+
});
255+
256+
expect(result.compaction.reserveTokens).toBe(DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR);
257+
expect(settingsManager.applyOverrides).toHaveBeenCalledWith({
258+
compaction: { reserveTokens: DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR },
259+
});
260+
});
261+
262+
it("does not cap floor when context window is large enough", () => {
263+
const settingsManager = {
264+
getCompactionReserveTokens: () => 16_384,
265+
getCompactionKeepRecentTokens: () => 20_000,
266+
applyOverrides: vi.fn(),
267+
};
268+
269+
// 200 000 context window → maxReserve = 200_000 - 8_000 = 192_000.
270+
// floor (20 000) is well within that cap.
271+
const result = applyPiCompactionSettingsFromConfig({
272+
settingsManager,
273+
contextTokenBudget: 200_000,
274+
});
275+
276+
expect(result.compaction.reserveTokens).toBe(DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR);
277+
expect(settingsManager.applyOverrides).toHaveBeenCalledWith({
278+
compaction: { reserveTokens: DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR },
279+
});
280+
});
281+
282+
it("falls back to uncapped floor when contextTokenBudget is not provided", () => {
283+
const settingsManager = {
284+
getCompactionReserveTokens: () => 16_384,
285+
getCompactionKeepRecentTokens: () => 20_000,
286+
applyOverrides: vi.fn(),
287+
};
288+
289+
// No contextTokenBudget → backward-compatible behavior, floor = 20 000.
290+
const result = applyPiCompactionSettingsFromConfig({ settingsManager });
291+
292+
expect(result.compaction.reserveTokens).toBe(DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR);
293+
});
123294
});
124295

125296
describe("resolveCompactionReserveTokensFloor", () => {

src/agents/pi-settings.ts

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
import type { OpenClawConfig } from "../config/types.openclaw.js";
22
import type { ContextEngineInfo } from "../context-engine/types.js";
3+
import {
4+
MIN_PROMPT_BUDGET_RATIO,
5+
MIN_PROMPT_BUDGET_TOKENS,
6+
} from "./pi-compaction-constants.js";
37

48
export const DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR = 20_000;
59

@@ -15,6 +19,12 @@ type PiSettingsManagerLike = {
1519
setCompactionEnabled?: (enabled: boolean) => void;
1620
};
1721

22+
/**
23+
* Ensures the compaction reserve tokens are at least the specified minimum.
24+
* Note: This function is not context-aware and uses an uncapped floor.
25+
* If called for small-context models without threading `contextTokenBudget`,
26+
* it may re-introduce context overflow issues.
27+
*/
1828
export function ensurePiCompactionReserveTokens(params: {
1929
settingsManager: PiSettingsManagerLike;
2030
minReserveTokens?: number;
@@ -58,6 +68,8 @@ function toPositiveInt(value: unknown): number | undefined {
5868
export function applyPiCompactionSettingsFromConfig(params: {
5969
settingsManager: PiSettingsManagerLike;
6070
cfg?: OpenClawConfig;
71+
/** When known, the resolved context window budget for the current model. */
72+
contextTokenBudget?: number;
6173
}): {
6274
didOverride: boolean;
6375
compaction: { reserveTokens: number; keepRecentTokens: number };
@@ -68,7 +80,22 @@ export function applyPiCompactionSettingsFromConfig(params: {
6880

6981
const configuredReserveTokens = toNonNegativeInt(compactionCfg?.reserveTokens);
7082
const configuredKeepRecentTokens = toPositiveInt(compactionCfg?.keepRecentTokens);
71-
const reserveTokensFloor = resolveCompactionReserveTokensFloor(params.cfg);
83+
let reserveTokensFloor = resolveCompactionReserveTokensFloor(params.cfg);
84+
85+
// Cap the floor to a safe fraction of the context window so that
86+
// small-context models (e.g. Ollama with 16 K tokens) are not starved of
87+
// prompt budget. Without this cap the default floor of 20 000 can exceed
88+
// the entire context window, causing every prompt to be classified as an
89+
// overflow and triggering an infinite compaction loop.
90+
const ctxBudget = params.contextTokenBudget;
91+
if (typeof ctxBudget === "number" && Number.isFinite(ctxBudget) && ctxBudget > 0) {
92+
const minPromptBudget = Math.min(
93+
MIN_PROMPT_BUDGET_TOKENS,
94+
Math.max(1, Math.floor(ctxBudget * MIN_PROMPT_BUDGET_RATIO)),
95+
);
96+
const maxReserve = Math.max(0, ctxBudget - minPromptBudget);
97+
reserveTokensFloor = Math.min(reserveTokensFloor, maxReserve);
98+
}
7299

73100
const targetReserveTokens = Math.max(
74101
configuredReserveTokens ?? currentReserveTokens,

0 commit comments

Comments
 (0)