Skip to content

fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens#16113

Closed
briandevans wants to merge 3 commits into
NousResearch:mainfrom
briandevans:fix/multimodal-tail-token-estimation-16087
Closed

fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens#16113
briandevans wants to merge 3 commits into
NousResearch:mainfrom
briandevans:fix/multimodal-tail-token-estimation-16087

Conversation

@briandevans

Copy link
Copy Markdown
Contributor

Problem

_find_tail_cut_by_tokens calls len(content) to estimate message tokens. When content is a list of blocks (multimodal: text + image_url), len() returns the block count (e.g. 2) rather than character count.

A message with 500 chars of text + 1 image block was counted as ~10 tokens instead of ~135. This caused the backward walk to exhaust the entire message list before reaching the budget ceiling. The head_end safeguard then forced cut = n - min_tail, shrinking the protected tail to the bare 3-message minimum and preventing effective compression of multimodal conversations.

Closes #16087.

Root cause

# Before — wrong for multimodal content:
content = msg.get("content") or ""
msg_tokens = len(content) // _CHARS_PER_TOKEN + 10
# len([text_block, image_block]) == 2, not ~500

Fix

Mirror the existing pattern already used in _prune_old_tool_results (line 487):

raw_content = msg.get("content") or ""
content_len = (
    sum(len(p.get("text", "")) for p in raw_content)
    if isinstance(raw_content, list)
    else len(raw_content)
)
msg_tokens = content_len // _CHARS_PER_TOKEN + 10

Tests

Three new tests added to TestTokenBudgetTailProtection:

Test Purpose
test_multimodal_message_accumulates_text_chars_not_block_count Regression guard: fails with old code (tail has 3 msgs), passes with fix (4 msgs)
test_plain_string_content_unchanged Plain string estimation unchanged after fix
test_image_only_block_contributes_zero_text_chars Image-only blocks (no text key) contribute 0 chars + base overhead; no crash

All 55 tests in tests/agent/test_context_compressor.py pass.

… _find_tail_cut_by_tokens

_find_tail_cut_by_tokens called len(content) to estimate message tokens.
When content is a list of blocks (multimodal: text + image_url), len()
returns block count (e.g. 2) rather than character count, so a message
with 500 chars of text was counted as ~10 tokens instead of ~135.

This caused the backward walk to exhaust all messages before hitting the
budget ceiling; the head_end safeguard then forced cut = n - min_tail,
shrinking the protected tail to the bare minimum and preventing effective
compression of long multimodal conversations.

Fix mirrors the existing pattern in _prune_old_tool_results (line 487):
  sum(len(p.get("text", "")) for p in raw_content)
  if isinstance(raw_content, list) else len(raw_content)

Tests: 3 new cases in TestTokenBudgetTailProtection — regression guard
(confirms the test fails with the bug), plain-string regression guard,
and image-only block edge case.

Fixes NousResearch#16087.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 26, 2026 15:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes multimodal token estimation in the context compressor’s tail-protection logic so that multimodal content lists contribute token estimates based on summed text character length (instead of list length), preventing overly-large protected tails and enabling effective compression for vision/multimodal chats.

Changes:

  • Update _find_tail_cut_by_tokens to estimate tokens from summed text chars for multimodal content lists.
  • Add regression tests covering multimodal content, plain-string parity, and image-only blocks.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
agent/context_compressor.py Corrects tail token estimation for multimodal messages by summing text block lengths.
tests/agent/test_context_compressor.py Adds regression tests to ensure multimodal tail protection behaves as intended.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agent/context_compressor.py Outdated
msg_tokens = len(content) // _CHARS_PER_TOKEN + 10 # +10 for role/metadata
raw_content = msg.get("content") or ""
content_len = (
sum(len(p.get("text", "")) for p in raw_content)

Copilot AI Apr 26, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw_content can be a list that includes non-dict parts (e.g., bare strings are supported/normalized elsewhere). Iterating with p.get(...) will raise AttributeError for string items and would crash context compression. Consider making the length calculation resilient by checking part types (dict vs str) and treating unknown parts as 0 or len(str(part)).

Suggested change
sum(len(p.get("text", "")) for p in raw_content)
sum(
len(p.get("text", ""))
if isinstance(p, dict)
else len(p)
if isinstance(p, str)
else len(str(p))
for p in raw_content
)

Copilot uses AI. Check for mistakes.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 26, 2026
@alt-glitch alt-glitch mentioned this pull request Apr 26, 2026
2 tasks
…t list

raw_content from message["content"] can be a list that contains bare
strings, not only dicts.  The previous `p.get("text", "")` call raised
AttributeError on string items, crashing context compression for any
session that had a message with mixed content.

Guard with isinstance checks: dict → .get("text"), str → len(p),
fallback → len(str(p)).  Adds a regression test covering the bare-string
case that would have AttributeError'd on the pre-fix code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@briandevans

Copy link
Copy Markdown
Contributor Author

Thanks @copilot — valid catch. Addressed in 80ae2621.

Bare-string items in multimodal content list (line 1087): You're right that raw_content can be a list with non-dict items. The old p.get("text", "") call raises AttributeError on any bare string item in the list. Fixed by guarding with isinstance checks:

  • dict.get("text", "")
  • strlen(p) directly
  • anything else → len(str(p))

Regression guard: running the old code against ["Hello, world!", {"type": "text", "text": "extra"}] raises AttributeError: 'str' object has no attribute 'get'; new code returns 23 (correct char sum). Added test_mixed_list_with_bare_strings_does_not_crash covering this case.

@briandevans

Copy link
Copy Markdown
Contributor Author

Note re @alt-glitch's cross-reference to #16117:

This PR (#16113) #16117 (SimbaKingjoe)
Multimodal dict items () .get("text", "") ✅ same
Bare string items in list (valid OpenAI wire format) len(p) ❌ excluded by isinstance(p, dict) → contributes 0
Non-dict, non-string items len(str(p)) fallback ❌ excluded → contributes 0
Crash risk none none
Tests 3 targeted tests + 1 regression guard (bare-string case) 1 test (dict blocks only)

The bare-string item case is real: the Anthropic wire format allows a list with a single bare string alongside dict blocks, and the OpenAI format allows a list of strings as content. #16117's fix handles the common case but silently under-counts non-dict list items. Our fix mirrors the exact idiom in _prune_old_tool_results and extends it with the bare-string guard that Copilot caught.

Happy to defer to whichever fix the maintainers prefer.

The bare-string isinstance guard added in 80ae262 covered _find_tail_cut_by_tokens
(line 1084) but missed the identical pattern in _calculate_protect_tail_boundary
(line 487, the protect-tail scan loop).  Both loops call .get("text", "") on every
list item in message["content"]; both crash with AttributeError when that list
contains a bare string.

Apply the same dict/str/fallback isinstance guard to the protect-tail path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@briandevans

Copy link
Copy Markdown
Contributor Author

@copilot Valid catch — the bare-string guard was applied to _find_tail_cut_by_tokens in 80ae262 but missed the identical pattern in the protect-tail boundary scan loop (line 487). Fixed in 05cd158: same isinstance(p, dict) / isinstance(p, str) / len(str(p)) guard applied to both loops. All 56 tests/agent/test_context_compressor.py tests pass.

@teknium1

Copy link
Copy Markdown
Contributor

Salvaged via PR #16369 — your commits were cherry-picked onto current main with authorship preserved via rebase-merge. Thanks @briandevans!

@teknium1 teknium1 closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: underestimates token count for multimodal messages, causing oversized tail protection and ineffective context compression

4 participants