Skip to content

fix: clamp compaction max_tokens to model output limit#1385

Open
BingqingLyu wants to merge 3 commits intomainfrom
fork-pr-54392-fix-compaction-max-tokens-cap
Open

fix: clamp compaction max_tokens to model output limit#1385
BingqingLyu wants to merge 3 commits intomainfrom
fork-pr-54392-fix-compaction-max-tokens-cap

Conversation

@BingqingLyu
Copy link
Copy Markdown
Owner

@BingqingLyu BingqingLyu commented Apr 27, 2026

Summary

Fixes openclaw#54383 — Compaction fails with max_tokens: 240000 > 128000 when using Anthropic models with 1M context windows.

Root Cause

In @mariozechner/pi-coding-agent, generateSummary() calculates:

const maxTokens = Math.floor(0.8 * reserveTokens);

With reserveTokensFloor: 300000 (appropriate for 1M context), this produces max_tokens = 240000 — exceeding Anthropic's per-request output cap of 128K for both Sonnet 4.6 and Opus 4.6.

Fix

Clamp reserveTokens in src/agents/compaction.ts before passing to generateSummary():

const modelMaxTokens = params.model.maxTokens ?? 128_000;
const clampedReserveTokens = Math.min(params.reserveTokens, Math.floor(modelMaxTokens / 0.8));

This ensures the downstream max_tokens calculation (0.8 * reserveTokens) never exceeds the model's actual output limit. The fix uses model.maxTokens from the provider registry, so it's forward-compatible — if future models raise their output cap, no code change is needed.

Impact

  • Before: Compaction broken for all users with Anthropic + 1M context (any reserveTokensFloor > 160K)
  • After: Compaction works correctly, respecting model output limits while preserving the existing summarization quality

Testing

The fix is in the OpenClaw wrapper layer (src/agents/compaction.ts), not in the upstream pi-coding-agent package. This is the minimal, safest change — the upstream package could also benefit from the same clamp in generateSummary() itself.

Verified that:

  • model.maxTokens is populated from the provider catalog (128K for Anthropic Vertex models)
  • Math.floor(128000 / 0.8) = 160000, so clampedReserveTokens = min(300000, 160000) = 160000
  • generateSummary then calculates Math.floor(0.8 * 160000) = 128000 ✅ (within model limit)

FORGE and others added 3 commits March 25, 2026 04:25
With 1M context windows, reserveTokensFloor can be 300K+. The
generateSummary() function in pi-coding-agent calculates max_tokens
as Math.floor(0.8 * reserveTokens), producing 240K — which exceeds
Anthropic's per-request output cap of 128K for Sonnet/Opus 4.6.

This fix clamps reserveTokens before passing to generateSummary(),
ensuring the resulting max_tokens never exceeds the model's maxTokens.

The clamp uses model.maxTokens from the provider registry (falls back
to 128K if unset). This is forward-compatible — if future models raise
their output cap, no code change is needed.

Fixes openclaw#54383
Validates that summarizeChunks clamps reserveTokens to
Math.floor(model.maxTokens / 0.8) to prevent max_tokens from
exceeding the model's output limit.

Covers:
- Clamping when reserveTokens (300K) exceeds model output cap (128K)
- Pass-through when reserveTokens is already within bounds
- Fallback to 128K default when model has no maxTokens field
- Consistent clamping across all chunks in staged summarization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compaction fails with 1M context: max_tokens 240000 > 128000 for Anthropic models

2 participants