Skip to content

fix: clamp compaction max_tokens to model output limit#54392

Merged
jalehman merged 6 commits intoopenclaw:mainfrom
adzendo:fix/compaction-max-tokens-cap
May 6, 2026
Merged

fix: clamp compaction max_tokens to model output limit#54392
jalehman merged 6 commits intoopenclaw:mainfrom
adzendo:fix/compaction-max-tokens-cap

Conversation

@adzendo
Copy link
Copy Markdown
Contributor

@adzendo adzendo commented Mar 25, 2026

Summary

Fixes #54383 — Compaction fails with max_tokens: 240000 > 128000 when using Anthropic models with 1M context windows.

Root Cause

In @mariozechner/pi-coding-agent, generateSummary() calculates:

const maxTokens = Math.floor(0.8 * reserveTokens);

With reserveTokensFloor: 300000 (appropriate for 1M context), this produces max_tokens = 240000 — exceeding Anthropic's per-request output cap of 128K for both Sonnet 4.6 and Opus 4.6.

Fix

Clamp reserveTokens in src/agents/compaction.ts before passing to generateSummary():

const modelMaxTokens = params.model.maxTokens ?? 128_000;
const clampedReserveTokens = Math.min(params.reserveTokens, Math.floor(modelMaxTokens / 0.8));

This ensures the downstream max_tokens calculation (0.8 * reserveTokens) never exceeds the model's actual output limit. The fix uses model.maxTokens from the provider registry, so it's forward-compatible — if future models raise their output cap, no code change is needed.

Impact

  • Before: Compaction broken for all users with Anthropic + 1M context (any reserveTokensFloor > 160K)
  • After: Compaction works correctly, respecting model output limits while preserving the existing summarization quality

Real behavior proof

  • Behavior or issue addressed: Compaction with Anthropic 1M-context models used to send max_tokens: 240000, exceeding the 128000 output-token limit.
  • Real environment tested: Fresh temporary OpenClaw state directories on this checkout, using anthropic/claude-sonnet-4-6, 1000000 context tokens, 300000 configured reserve tokens, and a local Anthropic-compatible capture server that records real provider request bodies and rejects max_tokens > 128000.
  • Exact steps or command run after this patch:
pnpm tsx scripts/e2e/compaction-max-tokens-proof.mjs --repo ../.. --expect fail --label main-local --mode summarize --timeout-ms 45000 --output .artifacts/pr-54392/compaction-proof-main-local.json
pnpm tsx scripts/e2e/compaction-max-tokens-proof.mjs --repo . --expect pass --label pr-local --mode summarize --timeout-ms 45000 --output .artifacts/pr-54392/compaction-proof-pr-local.json
  • Evidence after fix: Terminal output from the PR-head run captured provider max_tokens as [128000] at 0aeb92a3a7c7; the same harness against current main 2d97dcebb5fc captured [240000, 240000, 240000] and the fake Anthropic endpoint rejected 240000 > 128000.
  • Observed result after fix: The real OpenClaw wrapper path (src/agents/compaction.ts -> summarizeInStages -> Pi generateSummary -> provider request) sent max_tokens: 128000, which stays within the model output limit. The clamp feeds Pi a 160000 reserve-token value so its downstream Math.floor(0.8 * reserveTokens) calculation cannot exceed 128000.
  • What was not tested: Full embedded manual compaction mode in CI; the checked proof covers the provider request boundary that regressed.

Testing

Real behavior proof was run twice from fresh temporary OpenClaw state directories with a local Anthropic-compatible capture server. The harness drives the real OpenClaw wrapper path (src/agents/compaction.ts -> summarizeInStages -> Pi generateSummary -> provider request) using anthropic/claude-sonnet-4-6, 1000000 context tokens, 300000 configured reserve tokens, and a 128000 model output limit.

Temporary proof commands used locally:

pnpm tsx scripts/e2e/compaction-max-tokens-proof.mjs --repo ../.. --expect fail --label main-local --mode summarize --timeout-ms 45000 --output .artifacts/pr-54392/compaction-proof-main-local.json
pnpm tsx scripts/e2e/compaction-max-tokens-proof.mjs --repo . --expect pass --label pr-local --mode summarize --timeout-ms 45000 --output .artifacts/pr-54392/compaction-proof-pr-local.json
Ref Expected Captured provider max_tokens Result
current main 2d97dce fail [240000, 240000, 240000] fake Anthropic endpoint rejected 240000 > 128000
PR head 0aeb92a3a7c7 pass [128000] provider request stayed within the 128000 output limit

This proves the regression at the provider request boundary: without the clamp, OpenClaw asks Anthropic for 240000 output tokens; with this PR, the downstream Pi calculation receives a clamped reserve token value of 160000 and emits 128000 output tokens.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: XS labels Mar 25, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 25, 2026

Greptile Summary

This PR fixes a bug where compaction fails with a max_tokens exceeds model limit error when using Anthropic models with 1M-token context windows. The root cause is that generateSummary() derives max_tokens as Math.floor(0.8 * reserveTokens), and with a reserveTokensFloor of 300 000 the result (240 000) exceeds Anthropic's per-request output cap of 128 K.

Changes:

  • Adds a clamp in summarizeChunks before passing reserveTokens to generateSummary: Math.min(reserveTokens, Math.floor(model.maxTokens / 0.8)). The math is correct and the ?? 128_000 fallback is appropriate for models that do not advertise their cap.
  • The fix is applied at the one callsite where generateSummary is invoked, so summarizeWithFallback and summarizeInStages both benefit automatically without changes.
  • No test is added for the new clamping path. The existing mock-based test files (compaction.identifier-preservation.test.ts) provide exactly the right infrastructure to verify this behaviour, and the codebase guidelines (CLAUDE.md) ask for narrowly-scoped tests for narrowly-scoped changes.

Confidence Score: 4/5

  • Safe to merge; the fix is correct and well-reasoned, with a single non-blocking follow-up (adding a scoped test) remaining.
  • The arithmetic is sound, the fallback is appropriate, and the fix is applied at the right abstraction level. The only gap is missing test coverage for the new clamping path, which the codebase guidelines explicitly require for narrowly-scoped changes. This is a P2 style item and does not affect correctness.
  • No files require special attention beyond adding a test in src/agents/compaction.identifier-preservation.test.ts.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/compaction.ts
Line: 240-241

Comment:
**Missing test for the new clamping behaviour**

The clamping logic is the sole change in this PR and is the direct fix for the reported bug, but no test validates that it works. Both `compaction.identifier-preservation.test.ts` and `compaction.retry.test.ts` already mock `generateSummary` and call into `summarizeInStages` via `summarizeWithFallback`, so the test infrastructure is already in place.

Per `CLAUDE.md`, narrowly-scoped changes should include narrowly-scoped tests that directly validate the touched behaviour.

A good place for this is `compaction.identifier-preservation.test.ts` — pass a model with `maxTokens: 128_000` and `reserveTokens: 300_000`, then assert that the third argument passed to the `mockGenerateSummary` spy is ≤ `Math.floor(128_000 / 0.8)` (160 000). This directly guards the invariant described in the PR and catches future regressions if the `0.8` multiplier inside `generateSummary` ever changes.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: clamp compaction max_tokens to mode..." | Re-trigger Greptile

Comment thread src/agents/compaction.ts
Comment on lines +240 to +241
const modelMaxTokens = params.model.maxTokens ?? 128_000;
const clampedReserveTokens = Math.min(params.reserveTokens, Math.floor(modelMaxTokens / 0.8));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing test for the new clamping behaviour

The clamping logic is the sole change in this PR and is the direct fix for the reported bug, but no test validates that it works. Both compaction.identifier-preservation.test.ts and compaction.retry.test.ts already mock generateSummary and call into summarizeInStages via summarizeWithFallback, so the test infrastructure is already in place.

Per CLAUDE.md, narrowly-scoped changes should include narrowly-scoped tests that directly validate the touched behaviour.

A good place for this is compaction.identifier-preservation.test.ts — pass a model with maxTokens: 128_000 and reserveTokens: 300_000, then assert that the third argument passed to the mockGenerateSummary spy is ≤ Math.floor(128_000 / 0.8) (160 000). This directly guards the invariant described in the PR and catches future regressions if the 0.8 multiplier inside generateSummary ever changes.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/compaction.ts
Line: 240-241

Comment:
**Missing test for the new clamping behaviour**

The clamping logic is the sole change in this PR and is the direct fix for the reported bug, but no test validates that it works. Both `compaction.identifier-preservation.test.ts` and `compaction.retry.test.ts` already mock `generateSummary` and call into `summarizeInStages` via `summarizeWithFallback`, so the test infrastructure is already in place.

Per `CLAUDE.md`, narrowly-scoped changes should include narrowly-scoped tests that directly validate the touched behaviour.

A good place for this is `compaction.identifier-preservation.test.ts` — pass a model with `maxTokens: 128_000` and `reserveTokens: 300_000`, then assert that the third argument passed to the `mockGenerateSummary` spy is ≤ `Math.floor(128_000 / 0.8)` (160 000). This directly guards the invariant described in the PR and catches future regressions if the `0.8` multiplier inside `generateSummary` ever changes.

How can I resolve this? If you propose a fix, please make it concise.

@adzendo
Copy link
Copy Markdown
Contributor Author

adzendo commented Mar 25, 2026

Added a dedicated test file src/agents/compaction.reserve-tokens-clamping.test.ts addressing the review feedback (e6e57c6).

Four tests covering:

  • Core invariant: reserveTokens=300K with maxTokens=128K clamps to exactly Math.floor(128_000 / 0.8) = 160_000
  • Pass-through: reserveTokens already within bounds passes unchanged
  • Default fallback: model without maxTokens field falls back to 128_000 cap
  • Staged summarization: clamping is consistent across all chunks in split+merge paths

All 4 new tests + 21 existing compaction tests pass. Pre-commit hooks (pnpm check — format, lint, type-aware, 26 boundary checks) all clean.

@jalehman jalehman force-pushed the fix/compaction-max-tokens-cap branch from e6e57c6 to 32fe559 Compare March 26, 2026 22:12
@jalehman jalehman force-pushed the fix/compaction-max-tokens-cap branch from 32fe559 to a494a03 Compare March 26, 2026 22:16
@jalehman jalehman force-pushed the fix/compaction-max-tokens-cap branch from a494a03 to 463704b Compare March 26, 2026 23:55
@jalehman jalehman force-pushed the fix/compaction-max-tokens-cap branch from 463704b to 573bb94 Compare March 27, 2026 00:04
@jalehman jalehman force-pushed the fix/compaction-max-tokens-cap branch from 573bb94 to cc271e0 Compare March 27, 2026 00:09
@jalehman jalehman force-pushed the fix/compaction-max-tokens-cap branch from cc271e0 to 7dc046d Compare March 27, 2026 01:21
@jalehman jalehman force-pushed the fix/compaction-max-tokens-cap branch from 7dc046d to 5d9f719 Compare March 27, 2026 01:24
jalehman added a commit that referenced this pull request Mar 27, 2026
Point the stale contract-test mock at the current Codex runtime module and instantiate the provider directly for the one mock-sensitive fallback case. This keeps the fallback covered without changing production refresh behavior or the separate registry contract coverage.

Regeneration-Prompt: |
  Fix MAR-231, the unrelated prep blocker on PR #54392. The failing surface was src/plugins/contracts/runtime.contract.test.ts for the OpenAI Codex refresh fallback case. Keep production behavior unchanged. The contract loader uses Jiti for bundled providers, which bypasses the Vitest mock of the runtime wrapper and caused the test to hit the real OAuth refresh path. Update the stale runtime mock path and make only that fallback assertion use the direct provider module, while leaving registry coverage in src/plugins/contracts/registry.contract.test.ts intact. Validate with the focused Codex runtime contract test and the registry contract test.
@jalehman jalehman force-pushed the fix/compaction-max-tokens-cap branch from 5d9f719 to 5f1e51c Compare March 29, 2026 00:00
jalehman added a commit to adzendo/openclaw that referenced this pull request Mar 29, 2026
jalehman added a commit to adzendo/openclaw that referenced this pull request Mar 30, 2026
@jalehman jalehman force-pushed the fix/compaction-max-tokens-cap branch from 5f1e51c to e6df16c Compare March 30, 2026 22:51
@jalehman jalehman force-pushed the fix/compaction-max-tokens-cap branch from f745183 to 6e3098c Compare May 2, 2026 01:26
@jalehman jalehman self-assigned this May 6, 2026
@jalehman jalehman force-pushed the fix/compaction-max-tokens-cap branch from 6e3098c to 0aeb92a Compare May 6, 2026 18:20
@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 6, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 6, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026
@jalehman jalehman force-pushed the fix/compaction-max-tokens-cap branch from 0aeb92a to 1bc3f8e Compare May 6, 2026 19:41
jalehman added a commit to adzendo/openclaw that referenced this pull request May 6, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026
FORGE and others added 6 commits May 6, 2026 14:40
With 1M context windows, reserveTokensFloor can be 300K+. The
generateSummary() function in pi-coding-agent calculates max_tokens
as Math.floor(0.8 * reserveTokens), producing 240K — which exceeds
Anthropic's per-request output cap of 128K for Sonnet/Opus 4.6.

This fix clamps reserveTokens before passing to generateSummary(),
ensuring the resulting max_tokens never exceeds the model's maxTokens.

The clamp uses model.maxTokens from the provider registry (falls back
to 128K if unset). This is forward-compatible — if future models raise
their output cap, no code change is needed.

Fixes openclaw#54383
Validates that summarizeChunks clamps reserveTokens to
Math.floor(model.maxTokens / 0.8) to prevent max_tokens from
exceeding the model's output limit.

Covers:
- Clamping when reserveTokens (300K) exceeds model output cap (128K)
- Pass-through when reserveTokens is already within bounds
- Fallback to 128K default when model has no maxTokens field
- Consistent clamping across all chunks in staged summarization
@jalehman jalehman force-pushed the fix/compaction-max-tokens-cap branch from 1bc3f8e to 8a88821 Compare May 6, 2026 21:41
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026
@jalehman jalehman merged commit ac43135 into openclaw:main May 6, 2026
86 of 87 checks passed
@jalehman
Copy link
Copy Markdown
Contributor

jalehman commented May 6, 2026

Merged via squash.

Thanks @adzendo!

rogerdigital pushed a commit to rogerdigital/openclaw that referenced this pull request May 7, 2026
Merged via squash.

Prepared head SHA: 8a88821
Co-authored-by: adzendo <246828680+adzendo@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
steipete pushed a commit that referenced this pull request May 7, 2026
Merged via squash.

Prepared head SHA: 8a88821
Co-authored-by: adzendo <246828680+adzendo@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman

(cherry picked from commit ac43135)
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
Merged via squash.

Prepared head SHA: 8a88821
Co-authored-by: adzendo <246828680+adzendo@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
rogerdigital pushed a commit to rogerdigital/openclaw that referenced this pull request May 9, 2026
Merged via squash.

Prepared head SHA: 8a88821
Co-authored-by: adzendo <246828680+adzendo@users.noreply.github.com>
Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com>
Reviewed-by: @jalehman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling proof: supplied External PR includes structured after-fix real behavior proof. size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compaction fails with 1M context: max_tokens 240000 > 128000 for Anthropic models

2 participants