Skip to content

fix(gateway): cap compaction reserve floor to context window for small models#65671

Merged
openperf merged 2 commits intoopenclaw:mainfrom
openperf:fix/compaction-reserve-context-cap
Apr 14, 2026
Merged

fix(gateway): cap compaction reserve floor to context window for small models#65671
openperf merged 2 commits intoopenclaw:mainfrom
openperf:fix/compaction-reserve-context-cap

Conversation

@openperf
Copy link
Copy Markdown
Member

@openperf openperf commented Apr 13, 2026

Summary

  • Problem: Local models with small context windows (e.g., Ollama with 16K tokens) fail with "Context overflow: prompt too large for the model (precheck)" even for small prompts. The logs show reserveTokens=16384 and promptBudgetBeforeReserve=1. This happens because the default reserveTokensFloor (20,000) silently overrides user-configured reserveTokens (e.g., 2048) and exceeds the entire context window. (See src/agents/pi-settings.ts:73-76)
  • Root Cause: In applyPiCompactionSettingsFromConfig, the reserveTokensFloor is applied blindly without considering the model's actual context window size. For a 16K model, Math.max(configuredReserveTokens, 20_000) forces the reserve to 20,000. While shouldPreemptivelyCompactBeforePrompt has a downstream cap, the Pi SDK's internal auto-compaction reads the inflated value directly from settingsManager.getCompactionReserveTokens(), leading to infinite compaction loops or immediate overflow errors.
  • Fix: Thread the resolved contextTokenBudget into applyPiCompactionSettingsFromConfig and cap the reserveTokensFloor using the exact same formula used by the runtime precheck layer (shouldPreemptivelyCompactBeforePrompt). This ensures the floor protects users without starving small-context models of prompt budget, and respects explicit user configurations that fall below the uncapped floor but above the capped floor.
  • What changed:
    • src/agents/pi-settings.ts: Added contextTokenBudget parameter to applyPiCompactionSettingsFromConfig and implemented the floor cap logic by importing MIN_PROMPT_BUDGET_TOKENS and MIN_PROMPT_BUDGET_RATIO.
    • src/agents/pi-embedded-runner/run/preemptive-compaction.ts: Exported MIN_PROMPT_BUDGET_TOKENS and MIN_PROMPT_BUDGET_RATIO to serve as a single source of truth for the settings layer.
    • src/agents/pi-project-settings.ts: Updated createPreparedEmbeddedPiSettingsManager to accept and pass through contextTokenBudget.
    • src/agents/pi-embedded-runner/run/attempt.ts: Passed params.contextTokenBudget when creating the settings manager.
    • src/agents/pi-embedded-runner/compact.ts: Passed ctxInfo.tokens when creating the settings manager.
    • src/agents/pi-settings.test.ts: Added comprehensive tests for the new capping logic and updated imports.
  • What did NOT change (scope boundary): The downstream shouldPreemptivelyCompactBeforePrompt logic remains unchanged. The default floor value (20_000) remains unchanged for large-context models. No changes were made to config materialization or schema validation.

Reproduction

  1. Configure a local model (e.g., Ollama) with a small context window (e.g., 16384 tokens).
  2. Set agents.defaults.compaction.reserveTokens: 2048 and reserveTokensFloor: 0 in the config.
  3. Start a session and send a prompt.
  4. Observe the "Context overflow" error in the logs, with reserveTokens inflated to the context window size or 20,000.

Risk / Mitigation

  • Risk: The cap might reduce the reserve tokens too much for very small models, leading to less effective compaction.
  • Mitigation: The cap formula now exactly mirrors the runtime precheck layer by importing MIN_PROMPT_BUDGET_TOKENS and MIN_PROMPT_BUDGET_RATIO directly from preemptive-compaction.ts. This single source of truth ensures perfect consistency between the settings layer and the runtime precheck layer, preventing any silent drift. Comprehensive unit tests were added to verify the behavior across different context window sizes (16K, 32K, 200K) and user configurations.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Gateway
  • Agents

Linked Issue/PR

Fixes #65465

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels Apr 13, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 13, 2026

Greptile Summary

Caps the compaction reserveTokensFloor to 50% of the resolved context window budget when contextTokenBudget is provided, preventing the default 20 000-token floor from exceeding the entire context window on small-context local models (e.g. Ollama with 16 K tokens) and causing infinite compaction loops or immediate overflow errors.

The fix is minimal and well-scoped: the cap is additive (only reduces the floor, never raises it), fully backwards-compatible when contextTokenBudget is absent, and the new constant MAX_COMPACTION_RESERVE_RATIO mirrors the existing MIN_PROMPT_BUDGET_RATIO in the preemptive-compaction layer. One minor concern: the two 0.5 constants live in separate files with no compile-time link, so a future change to one could silently drift from the other.

Confidence Score: 5/5

Safe to merge; the fix is backwards-compatible, all new branches are covered by tests, and the only finding is a minor style concern about two independently defined constants.

All findings are P2. The core logic is correct, guard conditions are thorough, and the parameter threading is clean. No data-integrity or reliability concerns on the changed path.

No files require special attention.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/pi-settings.ts
Line: 7-11

Comment:
**Unshared constant risks silent drift**

`MAX_COMPACTION_RESERVE_RATIO` is defined here as a standalone `0.5`, while the constant it "mirrors" — `MIN_PROMPT_BUDGET_RATIO` in `preemptive-compaction.ts` — is a separate unexported `0.5`. If either value is later changed independently, the settings-layer cap and the runtime precheck will diverge without any compiler or test signal. Consider exporting `MIN_PROMPT_BUDGET_RATIO` from `preemptive-compaction.ts` and importing it here instead of redeclaring it, or at minimum add a cross-check test that asserts `MAX_COMPACTION_RESERVE_RATIO + MIN_PROMPT_BUDGET_RATIO <= 1`.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(gateway): cap compaction reserve flo..." | Re-trigger Greptile

Comment thread src/agents/pi-settings.ts Outdated
@openperf openperf force-pushed the fix/compaction-reserve-context-cap branch from e6cbb8d to b1e3206 Compare April 13, 2026 03:20
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b1e3206c78

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/pi-settings.ts Outdated
@openperf openperf force-pushed the fix/compaction-reserve-context-cap branch 2 times, most recently from 7c9f8d8 to 4c6dbec Compare April 14, 2026 09:17
@openperf openperf force-pushed the fix/compaction-reserve-context-cap branch from 4c6dbec to 623b16a Compare April 14, 2026 09:49
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 623b16a20f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/pi-settings.ts
@openperf openperf force-pushed the fix/compaction-reserve-context-cap branch 7 times, most recently from a40bc3e to 5bc2c2f Compare April 14, 2026 17:01
@openperf openperf force-pushed the fix/compaction-reserve-context-cap branch from 5bc2c2f to 5fe3e06 Compare April 14, 2026 17:07
@openperf openperf merged commit 4bc46cc into openclaw:main Apr 14, 2026
9 of 10 checks passed
chovizzz pushed a commit to chovizzz/openclaw that referenced this pull request Apr 17, 2026
…l models (openclaw#65671)

Fixes openclaw#65465. Caps the compaction reserveTokensFloor so that at least min(8 000, 50%) of the context window remains available for
  prompt content, preventing the default 20 000-token floor from exceeding the entire context window on small-context local models (e.g. Ollama
  16K). The cap is only applied when contextTokenBudget is provided, preserving backward compatibility.
kvnkho pushed a commit to kvnkho/openclaw that referenced this pull request Apr 17, 2026
…l models (openclaw#65671)

Fixes openclaw#65465. Caps the compaction reserveTokensFloor so that at least min(8 000, 50%) of the context window remains available for
  prompt content, preventing the default 20 000-token floor from exceeding the entire context window on small-context local models (e.g. Ollama
  16K). The cap is only applied when contextTokenBudget is provided, preserving backward compatibility.
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
…l models (openclaw#65671)

Fixes openclaw#65465. Caps the compaction reserveTokensFloor so that at least min(8 000, 50%) of the context window remains available for
  prompt content, preventing the default 20 000-token floor from exceeding the entire context window on small-context local models (e.g. Ollama
  16K). The cap is only applied when contextTokenBudget is provided, preserving backward compatibility.
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
…l models (openclaw#65671)

Fixes openclaw#65465. Caps the compaction reserveTokensFloor so that at least min(8 000, 50%) of the context window remains available for
  prompt content, preventing the default 20 000-token floor from exceeding the entire context window on small-context local models (e.g. Ollama
  16K). The cap is only applied when contextTokenBudget is provided, preserving backward compatibility.
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
…l models (openclaw#65671)

Fixes openclaw#65465. Caps the compaction reserveTokensFloor so that at least min(8 000, 50%) of the context window remains available for
  prompt content, preventing the default 20 000-token floor from exceeding the entire context window on small-context local models (e.g. Ollama
  16K). The cap is only applied when contextTokenBudget is provided, preserving backward compatibility.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Ollama local model ignores compaction reserve config and still prechecks with reserveTokens=16384

1 participant