fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens#16369
Conversation
… _find_tail_cut_by_tokens
_find_tail_cut_by_tokens called len(content) to estimate message tokens.
When content is a list of blocks (multimodal: text + image_url), len()
returns block count (e.g. 2) rather than character count, so a message
with 500 chars of text was counted as ~10 tokens instead of ~135.
This caused the backward walk to exhaust all messages before hitting the
budget ceiling; the head_end safeguard then forced cut = n - min_tail,
shrinking the protected tail to the bare minimum and preventing effective
compression of long multimodal conversations.
Fix mirrors the existing pattern in _prune_old_tool_results (line 487):
sum(len(p.get("text", "")) for p in raw_content)
if isinstance(raw_content, list) else len(raw_content)
Tests: 3 new cases in TestTokenBudgetTailProtection — regression guard
(confirms the test fails with the bug), plain-string regression guard,
and image-only block edge case.
Fixes #16087.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t list
raw_content from message["content"] can be a list that contains bare
strings, not only dicts. The previous `p.get("text", "")` call raised
AttributeError on string items, crashing context compression for any
session that had a message with mixed content.
Guard with isinstance checks: dict → .get("text"), str → len(p),
fallback → len(str(p)). Adds a regression test covering the bare-string
case that would have AttributeError'd on the pre-fix code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bare-string isinstance guard added in 80ae262 covered _find_tail_cut_by_tokens (line 1084) but missed the identical pattern in _calculate_protect_tail_boundary (line 487, the protect-tail scan loop). Both loops call .get("text", "") on every list item in message["content"]; both crash with AttributeError when that list contains a bare string. Apply the same dict/str/fallback isinstance guard to the protect-tail path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Post-merge clarification on the actual trigger: The reporter's stated repro (
So Where the miscount actually bites: list-shaped The fix is still correct — list-shaped content arithmetic was wrong regardless of origin — but the user-attached-image scenario from the issue is theoretical, not observable. |
Multimodal messages are now sized by text-char count, not block count — so tail protection stops at the token budget instead of exhausting the message list.
Salvages #16113 by @briandevans onto current main. Closes #16087 (reported by @SimbaKingjoe). Also closes #16117 (same bug, narrower fix).
Root cause
_find_tail_cut_by_tokenscalledlen(content) // _CHARS_PER_TOKEN + 10. Whencontentis a list of blocks ([{"type": "text", "text": "..."}, {"type": "image_url", ...}]),len()returned block count (~2), so every multimodal message scored ~10 tokens regardless of size. The backward walk exhausted the list before hitting the budget ceiling, the head_end safeguard forcedcut = n - min_tail, and compression had nothing left to summarize.Changes
agent/context_compressor.py— sum text-char lengths across list content blocks in both_find_tail_cut_by_tokensand_prune_old_tool_results, with isinstance guards for dict / str / fallback so bare-string items don't AttributeError.tests/agent/test_context_compressor.py— 4 regression tests: multimodal char-sum, plain-string unchanged, image-only block, mixed list with bare strings.Validation
tests/agent/test_context_compressor.py_find_tail_cut_by_tokensw/ real ContextCompressor