fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens#16113
Conversation
… _find_tail_cut_by_tokens
_find_tail_cut_by_tokens called len(content) to estimate message tokens.
When content is a list of blocks (multimodal: text + image_url), len()
returns block count (e.g. 2) rather than character count, so a message
with 500 chars of text was counted as ~10 tokens instead of ~135.
This caused the backward walk to exhaust all messages before hitting the
budget ceiling; the head_end safeguard then forced cut = n - min_tail,
shrinking the protected tail to the bare minimum and preventing effective
compression of long multimodal conversations.
Fix mirrors the existing pattern in _prune_old_tool_results (line 487):
sum(len(p.get("text", "")) for p in raw_content)
if isinstance(raw_content, list) else len(raw_content)
Tests: 3 new cases in TestTokenBudgetTailProtection — regression guard
(confirms the test fails with the bug), plain-string regression guard,
and image-only block edge case.
Fixes NousResearch#16087.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes multimodal token estimation in the context compressor’s tail-protection logic so that multimodal content lists contribute token estimates based on summed text character length (instead of list length), preventing overly-large protected tails and enabling effective compression for vision/multimodal chats.
Changes:
- Update
_find_tail_cut_by_tokensto estimate tokens from summed text chars for multimodalcontentlists. - Add regression tests covering multimodal content, plain-string parity, and image-only blocks.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
agent/context_compressor.py |
Corrects tail token estimation for multimodal messages by summing text block lengths. |
tests/agent/test_context_compressor.py |
Adds regression tests to ensure multimodal tail protection behaves as intended. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| msg_tokens = len(content) // _CHARS_PER_TOKEN + 10 # +10 for role/metadata | ||
| raw_content = msg.get("content") or "" | ||
| content_len = ( | ||
| sum(len(p.get("text", "")) for p in raw_content) |
There was a problem hiding this comment.
raw_content can be a list that includes non-dict parts (e.g., bare strings are supported/normalized elsewhere). Iterating with p.get(...) will raise AttributeError for string items and would crash context compression. Consider making the length calculation resilient by checking part types (dict vs str) and treating unknown parts as 0 or len(str(part)).
| sum(len(p.get("text", "")) for p in raw_content) | |
| sum( | |
| len(p.get("text", "")) | |
| if isinstance(p, dict) | |
| else len(p) | |
| if isinstance(p, str) | |
| else len(str(p)) | |
| for p in raw_content | |
| ) |
…t list
raw_content from message["content"] can be a list that contains bare
strings, not only dicts. The previous `p.get("text", "")` call raised
AttributeError on string items, crashing context compression for any
session that had a message with mixed content.
Guard with isinstance checks: dict → .get("text"), str → len(p),
fallback → len(str(p)). Adds a regression test covering the bare-string
case that would have AttributeError'd on the pre-fix code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks @copilot — valid catch. Addressed in Bare-string items in multimodal content list (line 1087): You're right that
Regression guard: running the old code against |
|
Note re @alt-glitch's cross-reference to #16117:
The bare-string item case is real: the Anthropic wire format allows a list with a single bare string alongside dict blocks, and the OpenAI format allows a list of strings as content. #16117's fix handles the common case but silently under-counts non-dict list items. Our fix mirrors the exact idiom in Happy to defer to whichever fix the maintainers prefer. |
The bare-string isinstance guard added in 80ae262 covered _find_tail_cut_by_tokens (line 1084) but missed the identical pattern in _calculate_protect_tail_boundary (line 487, the protect-tail scan loop). Both loops call .get("text", "") on every list item in message["content"]; both crash with AttributeError when that list contains a bare string. Apply the same dict/str/fallback isinstance guard to the protect-tail path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@copilot Valid catch — the bare-string guard was applied to |
|
Salvaged via PR #16369 — your commits were cherry-picked onto current main with authorship preserved via rebase-merge. Thanks @briandevans! |
Problem
_find_tail_cut_by_tokenscallslen(content)to estimate message tokens. Whencontentis a list of blocks (multimodal:text+image_url),len()returns the block count (e.g. 2) rather than character count.A message with 500 chars of text + 1 image block was counted as ~10 tokens instead of ~135. This caused the backward walk to exhaust the entire message list before reaching the budget ceiling. The
head_endsafeguard then forcedcut = n - min_tail, shrinking the protected tail to the bare 3-message minimum and preventing effective compression of multimodal conversations.Closes #16087.
Root cause
Fix
Mirror the existing pattern already used in
_prune_old_tool_results(line 487):Tests
Three new tests added to
TestTokenBudgetTailProtection:test_multimodal_message_accumulates_text_chars_not_block_counttest_plain_string_content_unchangedtest_image_only_block_contributes_zero_text_charstextkey) contribute 0 chars + base overhead; no crashAll 55 tests in
tests/agent/test_context_compressor.pypass.