fix: clamp compaction max_tokens to model output limit#54392
fix: clamp compaction max_tokens to model output limit#54392jalehman merged 6 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR fixes a bug where compaction fails with a Changes:
Confidence Score: 4/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/compaction.ts
Line: 240-241
Comment:
**Missing test for the new clamping behaviour**
The clamping logic is the sole change in this PR and is the direct fix for the reported bug, but no test validates that it works. Both `compaction.identifier-preservation.test.ts` and `compaction.retry.test.ts` already mock `generateSummary` and call into `summarizeInStages` via `summarizeWithFallback`, so the test infrastructure is already in place.
Per `CLAUDE.md`, narrowly-scoped changes should include narrowly-scoped tests that directly validate the touched behaviour.
A good place for this is `compaction.identifier-preservation.test.ts` — pass a model with `maxTokens: 128_000` and `reserveTokens: 300_000`, then assert that the third argument passed to the `mockGenerateSummary` spy is ≤ `Math.floor(128_000 / 0.8)` (160 000). This directly guards the invariant described in the PR and catches future regressions if the `0.8` multiplier inside `generateSummary` ever changes.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix: clamp compaction max_tokens to mode..." | Re-trigger Greptile |
| const modelMaxTokens = params.model.maxTokens ?? 128_000; | ||
| const clampedReserveTokens = Math.min(params.reserveTokens, Math.floor(modelMaxTokens / 0.8)); |
There was a problem hiding this comment.
Missing test for the new clamping behaviour
The clamping logic is the sole change in this PR and is the direct fix for the reported bug, but no test validates that it works. Both compaction.identifier-preservation.test.ts and compaction.retry.test.ts already mock generateSummary and call into summarizeInStages via summarizeWithFallback, so the test infrastructure is already in place.
Per CLAUDE.md, narrowly-scoped changes should include narrowly-scoped tests that directly validate the touched behaviour.
A good place for this is compaction.identifier-preservation.test.ts — pass a model with maxTokens: 128_000 and reserveTokens: 300_000, then assert that the third argument passed to the mockGenerateSummary spy is ≤ Math.floor(128_000 / 0.8) (160 000). This directly guards the invariant described in the PR and catches future regressions if the 0.8 multiplier inside generateSummary ever changes.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/compaction.ts
Line: 240-241
Comment:
**Missing test for the new clamping behaviour**
The clamping logic is the sole change in this PR and is the direct fix for the reported bug, but no test validates that it works. Both `compaction.identifier-preservation.test.ts` and `compaction.retry.test.ts` already mock `generateSummary` and call into `summarizeInStages` via `summarizeWithFallback`, so the test infrastructure is already in place.
Per `CLAUDE.md`, narrowly-scoped changes should include narrowly-scoped tests that directly validate the touched behaviour.
A good place for this is `compaction.identifier-preservation.test.ts` — pass a model with `maxTokens: 128_000` and `reserveTokens: 300_000`, then assert that the third argument passed to the `mockGenerateSummary` spy is ≤ `Math.floor(128_000 / 0.8)` (160 000). This directly guards the invariant described in the PR and catches future regressions if the `0.8` multiplier inside `generateSummary` ever changes.
How can I resolve this? If you propose a fix, please make it concise.|
Added a dedicated test file Four tests covering:
All 4 new tests + 21 existing compaction tests pass. Pre-commit hooks ( |
e6e57c6 to
32fe559
Compare
32fe559 to
a494a03
Compare
a494a03 to
463704b
Compare
463704b to
573bb94
Compare
573bb94 to
cc271e0
Compare
cc271e0 to
7dc046d
Compare
7dc046d to
5d9f719
Compare
Point the stale contract-test mock at the current Codex runtime module and instantiate the provider directly for the one mock-sensitive fallback case. This keeps the fallback covered without changing production refresh behavior or the separate registry contract coverage. Regeneration-Prompt: | Fix MAR-231, the unrelated prep blocker on PR #54392. The failing surface was src/plugins/contracts/runtime.contract.test.ts for the OpenAI Codex refresh fallback case. Keep production behavior unchanged. The contract loader uses Jiti for bundled providers, which bypasses the Vitest mock of the runtime wrapper and caused the test to hit the real OAuth refresh path. Update the stale runtime mock path and make only that fallback assertion use the direct provider module, while leaving registry coverage in src/plugins/contracts/registry.contract.test.ts intact. Validate with the focused Codex runtime contract test and the registry contract test.
5d9f719 to
5f1e51c
Compare
5f1e51c to
e6df16c
Compare
f745183 to
6e3098c
Compare
6e3098c to
0aeb92a
Compare
0aeb92a to
1bc3f8e
Compare
With 1M context windows, reserveTokensFloor can be 300K+. The generateSummary() function in pi-coding-agent calculates max_tokens as Math.floor(0.8 * reserveTokens), producing 240K — which exceeds Anthropic's per-request output cap of 128K for Sonnet/Opus 4.6. This fix clamps reserveTokens before passing to generateSummary(), ensuring the resulting max_tokens never exceeds the model's maxTokens. The clamp uses model.maxTokens from the provider registry (falls back to 128K if unset). This is forward-compatible — if future models raise their output cap, no code change is needed. Fixes openclaw#54383
Validates that summarizeChunks clamps reserveTokens to Math.floor(model.maxTokens / 0.8) to prevent max_tokens from exceeding the model's output limit. Covers: - Clamping when reserveTokens (300K) exceeds model output cap (128K) - Pass-through when reserveTokens is already within bounds - Fallback to 128K default when model has no maxTokens field - Consistent clamping across all chunks in staged summarization
1bc3f8e to
8a88821
Compare
|
Merged via squash.
Thanks @adzendo! |
Summary
Fixes #54383 — Compaction fails with
max_tokens: 240000 > 128000when using Anthropic models with 1M context windows.Root Cause
In
@mariozechner/pi-coding-agent,generateSummary()calculates:With
reserveTokensFloor: 300000(appropriate for 1M context), this producesmax_tokens = 240000— exceeding Anthropic's per-request output cap of 128K for both Sonnet 4.6 and Opus 4.6.Fix
Clamp
reserveTokensinsrc/agents/compaction.tsbefore passing togenerateSummary():This ensures the downstream
max_tokenscalculation (0.8 * reserveTokens) never exceeds the model's actual output limit. The fix usesmodel.maxTokensfrom the provider registry, so it's forward-compatible — if future models raise their output cap, no code change is needed.Impact
reserveTokensFloor> 160K)Real behavior proof
max_tokens: 240000, exceeding the128000output-token limit.anthropic/claude-sonnet-4-6,1000000context tokens,300000configured reserve tokens, and a local Anthropic-compatible capture server that records real provider request bodies and rejectsmax_tokens > 128000.pnpm tsx scripts/e2e/compaction-max-tokens-proof.mjs --repo ../.. --expect fail --label main-local --mode summarize --timeout-ms 45000 --output .artifacts/pr-54392/compaction-proof-main-local.json pnpm tsx scripts/e2e/compaction-max-tokens-proof.mjs --repo . --expect pass --label pr-local --mode summarize --timeout-ms 45000 --output .artifacts/pr-54392/compaction-proof-pr-local.jsonmax_tokensas[128000]at0aeb92a3a7c7; the same harness against current main2d97dcebb5fccaptured[240000, 240000, 240000]and the fake Anthropic endpoint rejected240000 > 128000.src/agents/compaction.ts->summarizeInStages-> PigenerateSummary-> provider request) sentmax_tokens: 128000, which stays within the model output limit. The clamp feeds Pi a160000reserve-token value so its downstreamMath.floor(0.8 * reserveTokens)calculation cannot exceed128000.Testing
Real behavior proof was run twice from fresh temporary OpenClaw state directories with a local Anthropic-compatible capture server. The harness drives the real OpenClaw wrapper path (
src/agents/compaction.ts->summarizeInStages-> PigenerateSummary-> provider request) usinganthropic/claude-sonnet-4-6,1000000context tokens,300000configured reserve tokens, and a128000model output limit.Temporary proof commands used locally:
pnpm tsx scripts/e2e/compaction-max-tokens-proof.mjs --repo ../.. --expect fail --label main-local --mode summarize --timeout-ms 45000 --output .artifacts/pr-54392/compaction-proof-main-local.json pnpm tsx scripts/e2e/compaction-max-tokens-proof.mjs --repo . --expect pass --label pr-local --mode summarize --timeout-ms 45000 --output .artifacts/pr-54392/compaction-proof-pr-local.jsonmax_tokens[240000, 240000, 240000]240000 > 128000[128000]128000output limitThis proves the regression at the provider request boundary: without the clamp, OpenClaw asks Anthropic for
240000output tokens; with this PR, the downstream Pi calculation receives a clamped reserve token value of160000and emits128000output tokens.