fix(gateway): cap compaction reserve floor to context window for small models#65671
Conversation
Greptile SummaryCaps the compaction The fix is minimal and well-scoped: the cap is additive (only reduces the floor, never raises it), fully backwards-compatible when Confidence Score: 5/5Safe to merge; the fix is backwards-compatible, all new branches are covered by tests, and the only finding is a minor style concern about two independently defined constants. All findings are P2. The core logic is correct, guard conditions are thorough, and the parameter threading is clean. No data-integrity or reliability concerns on the changed path. No files require special attention. Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/pi-settings.ts
Line: 7-11
Comment:
**Unshared constant risks silent drift**
`MAX_COMPACTION_RESERVE_RATIO` is defined here as a standalone `0.5`, while the constant it "mirrors" — `MIN_PROMPT_BUDGET_RATIO` in `preemptive-compaction.ts` — is a separate unexported `0.5`. If either value is later changed independently, the settings-layer cap and the runtime precheck will diverge without any compiler or test signal. Consider exporting `MIN_PROMPT_BUDGET_RATIO` from `preemptive-compaction.ts` and importing it here instead of redeclaring it, or at minimum add a cross-check test that asserts `MAX_COMPACTION_RESERVE_RATIO + MIN_PROMPT_BUDGET_RATIO <= 1`.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(gateway): cap compaction reserve flo..." | Re-trigger Greptile |
e6cbb8d to
b1e3206
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b1e3206c78
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
7c9f8d8 to
4c6dbec
Compare
4c6dbec to
623b16a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 623b16a20f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
a40bc3e to
5bc2c2f
Compare
5bc2c2f to
5fe3e06
Compare
…l models (openclaw#65671) Fixes openclaw#65465. Caps the compaction reserveTokensFloor so that at least min(8 000, 50%) of the context window remains available for prompt content, preventing the default 20 000-token floor from exceeding the entire context window on small-context local models (e.g. Ollama 16K). The cap is only applied when contextTokenBudget is provided, preserving backward compatibility.
…l models (openclaw#65671) Fixes openclaw#65465. Caps the compaction reserveTokensFloor so that at least min(8 000, 50%) of the context window remains available for prompt content, preventing the default 20 000-token floor from exceeding the entire context window on small-context local models (e.g. Ollama 16K). The cap is only applied when contextTokenBudget is provided, preserving backward compatibility.
…l models (openclaw#65671) Fixes openclaw#65465. Caps the compaction reserveTokensFloor so that at least min(8 000, 50%) of the context window remains available for prompt content, preventing the default 20 000-token floor from exceeding the entire context window on small-context local models (e.g. Ollama 16K). The cap is only applied when contextTokenBudget is provided, preserving backward compatibility.
…l models (openclaw#65671) Fixes openclaw#65465. Caps the compaction reserveTokensFloor so that at least min(8 000, 50%) of the context window remains available for prompt content, preventing the default 20 000-token floor from exceeding the entire context window on small-context local models (e.g. Ollama 16K). The cap is only applied when contextTokenBudget is provided, preserving backward compatibility.
…l models (openclaw#65671) Fixes openclaw#65465. Caps the compaction reserveTokensFloor so that at least min(8 000, 50%) of the context window remains available for prompt content, preventing the default 20 000-token floor from exceeding the entire context window on small-context local models (e.g. Ollama 16K). The cap is only applied when contextTokenBudget is provided, preserving backward compatibility.
Summary
reserveTokens=16384andpromptBudgetBeforeReserve=1. This happens because the defaultreserveTokensFloor(20,000) silently overrides user-configuredreserveTokens(e.g., 2048) and exceeds the entire context window. (Seesrc/agents/pi-settings.ts:73-76)applyPiCompactionSettingsFromConfig, thereserveTokensFlooris applied blindly without considering the model's actual context window size. For a 16K model,Math.max(configuredReserveTokens, 20_000)forces the reserve to 20,000. WhileshouldPreemptivelyCompactBeforePrompthas a downstream cap, the Pi SDK's internal auto-compaction reads the inflated value directly fromsettingsManager.getCompactionReserveTokens(), leading to infinite compaction loops or immediate overflow errors.contextTokenBudgetintoapplyPiCompactionSettingsFromConfigand cap thereserveTokensFloorusing the exact same formula used by the runtime precheck layer (shouldPreemptivelyCompactBeforePrompt). This ensures the floor protects users without starving small-context models of prompt budget, and respects explicit user configurations that fall below the uncapped floor but above the capped floor.src/agents/pi-settings.ts: AddedcontextTokenBudgetparameter toapplyPiCompactionSettingsFromConfigand implemented the floor cap logic by importingMIN_PROMPT_BUDGET_TOKENSandMIN_PROMPT_BUDGET_RATIO.src/agents/pi-embedded-runner/run/preemptive-compaction.ts: ExportedMIN_PROMPT_BUDGET_TOKENSandMIN_PROMPT_BUDGET_RATIOto serve as a single source of truth for the settings layer.src/agents/pi-project-settings.ts: UpdatedcreatePreparedEmbeddedPiSettingsManagerto accept and pass throughcontextTokenBudget.src/agents/pi-embedded-runner/run/attempt.ts: Passedparams.contextTokenBudgetwhen creating the settings manager.src/agents/pi-embedded-runner/compact.ts: PassedctxInfo.tokenswhen creating the settings manager.src/agents/pi-settings.test.ts: Added comprehensive tests for the new capping logic and updated imports.shouldPreemptivelyCompactBeforePromptlogic remains unchanged. The default floor value (20_000) remains unchanged for large-context models. No changes were made to config materialization or schema validation.Reproduction
agents.defaults.compaction.reserveTokens: 2048andreserveTokensFloor: 0in the config.reserveTokensinflated to the context window size or 20,000.Risk / Mitigation
MIN_PROMPT_BUDGET_TOKENSandMIN_PROMPT_BUDGET_RATIOdirectly frompreemptive-compaction.ts. This single source of truth ensures perfect consistency between the settings layer and the runtime precheck layer, preventing any silent drift. Comprehensive unit tests were added to verify the behavior across different context window sizes (16K, 32K, 200K) and user configurations.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Fixes #65465