fix: clamp compaction max_tokens to model output limit by BingqingLyu · Pull Request #1385 · BingqingLyu/openclaw

BingqingLyu · 2026-04-27T12:01:53Z

Summary

Fixes openclaw#54383 — Compaction fails with max_tokens: 240000 > 128000 when using Anthropic models with 1M context windows.

Root Cause

In @mariozechner/pi-coding-agent, generateSummary() calculates:

const maxTokens = Math.floor(0.8 * reserveTokens);

With reserveTokensFloor: 300000 (appropriate for 1M context), this produces max_tokens = 240000 — exceeding Anthropic's per-request output cap of 128K for both Sonnet 4.6 and Opus 4.6.

Fix

Clamp reserveTokens in src/agents/compaction.ts before passing to generateSummary():

const modelMaxTokens = params.model.maxTokens ?? 128_000;
const clampedReserveTokens = Math.min(params.reserveTokens, Math.floor(modelMaxTokens / 0.8));

This ensures the downstream max_tokens calculation (0.8 * reserveTokens) never exceeds the model's actual output limit. The fix uses model.maxTokens from the provider registry, so it's forward-compatible — if future models raise their output cap, no code change is needed.

Impact

Before: Compaction broken for all users with Anthropic + 1M context (any reserveTokensFloor > 160K)
After: Compaction works correctly, respecting model output limits while preserving the existing summarization quality

Testing

The fix is in the OpenClaw wrapper layer (src/agents/compaction.ts), not in the upstream pi-coding-agent package. This is the minimal, safest change — the upstream package could also benefit from the same clamp in generateSummary() itself.

Verified that:

model.maxTokens is populated from the provider catalog (128K for Anthropic Vertex models)
Math.floor(128000 / 0.8) = 160000, so clampedReserveTokens = min(300000, 160000) = 160000
generateSummary then calculates Math.floor(0.8 * 160000) = 128000 ✅ (within model limit)

With 1M context windows, reserveTokensFloor can be 300K+. The generateSummary() function in pi-coding-agent calculates max_tokens as Math.floor(0.8 * reserveTokens), producing 240K — which exceeds Anthropic's per-request output cap of 128K for Sonnet/Opus 4.6. This fix clamps reserveTokens before passing to generateSummary(), ensuring the resulting max_tokens never exceeds the model's maxTokens. The clamp uses model.maxTokens from the provider registry (falls back to 128K if unset). This is forward-compatible — if future models raise their output cap, no code change is needed. Fixes openclaw#54383

Validates that summarizeChunks clamps reserveTokens to Math.floor(model.maxTokens / 0.8) to prevent max_tokens from exceeding the model's output limit. Covers: - Clamping when reserveTokens (300K) exceeds model output cap (128K) - Pass-through when reserveTokens is already within bounds - Fallback to 128K default when model has no maxTokens field - Consistent clamping across all chunks in staged summarization

FORGE and others added 3 commits March 25, 2026 04:25

fix: document compaction clamp for openclaw#54392 thanks @adzendo

e6df16c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: clamp compaction max_tokens to model output limit#1385

fix: clamp compaction max_tokens to model output limit#1385
BingqingLyu wants to merge 3 commits intomainfrom
fork-pr-54392-fix-compaction-max-tokens-cap

BingqingLyu commented Apr 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BingqingLyu commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Fix

Impact

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BingqingLyu commented Apr 27, 2026 •

edited

Loading