fix: clamp compaction max_tokens to model output limit#1385
Open
BingqingLyu wants to merge 3 commits intomainfrom
Open
fix: clamp compaction max_tokens to model output limit#1385BingqingLyu wants to merge 3 commits intomainfrom
BingqingLyu wants to merge 3 commits intomainfrom
Conversation
With 1M context windows, reserveTokensFloor can be 300K+. The generateSummary() function in pi-coding-agent calculates max_tokens as Math.floor(0.8 * reserveTokens), producing 240K — which exceeds Anthropic's per-request output cap of 128K for Sonnet/Opus 4.6. This fix clamps reserveTokens before passing to generateSummary(), ensuring the resulting max_tokens never exceeds the model's maxTokens. The clamp uses model.maxTokens from the provider registry (falls back to 128K if unset). This is forward-compatible — if future models raise their output cap, no code change is needed. Fixes openclaw#54383
Validates that summarizeChunks clamps reserveTokens to Math.floor(model.maxTokens / 0.8) to prevent max_tokens from exceeding the model's output limit. Covers: - Clamping when reserveTokens (300K) exceeds model output cap (128K) - Pass-through when reserveTokens is already within bounds - Fallback to 128K default when model has no maxTokens field - Consistent clamping across all chunks in staged summarization
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes openclaw#54383 — Compaction fails with
max_tokens: 240000 > 128000when using Anthropic models with 1M context windows.Root Cause
In
@mariozechner/pi-coding-agent,generateSummary()calculates:With
reserveTokensFloor: 300000(appropriate for 1M context), this producesmax_tokens = 240000— exceeding Anthropic's per-request output cap of 128K for both Sonnet 4.6 and Opus 4.6.Fix
Clamp
reserveTokensinsrc/agents/compaction.tsbefore passing togenerateSummary():This ensures the downstream
max_tokenscalculation (0.8 * reserveTokens) never exceeds the model's actual output limit. The fix usesmodel.maxTokensfrom the provider registry, so it's forward-compatible — if future models raise their output cap, no code change is needed.Impact
reserveTokensFloor> 160K)Testing
The fix is in the OpenClaw wrapper layer (
src/agents/compaction.ts), not in the upstreampi-coding-agentpackage. This is the minimal, safest change — the upstream package could also benefit from the same clamp ingenerateSummary()itself.Verified that:
model.maxTokensis populated from the provider catalog (128K for Anthropic Vertex models)Math.floor(128000 / 0.8) = 160000, soclampedReserveTokens = min(300000, 160000) = 160000generateSummarythen calculatesMath.floor(0.8 * 160000) = 128000✅ (within model limit)