Skip to content

fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens#16369

Merged
teknium1 merged 3 commits into
mainfrom
hermes/hermes-435518c4
Apr 27, 2026
Merged

fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens#16369
teknium1 merged 3 commits into
mainfrom
hermes/hermes-435518c4

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Multimodal messages are now sized by text-char count, not block count — so tail protection stops at the token budget instead of exhausting the message list.

Salvages #16113 by @briandevans onto current main. Closes #16087 (reported by @SimbaKingjoe). Also closes #16117 (same bug, narrower fix).

Root cause

_find_tail_cut_by_tokens called len(content) // _CHARS_PER_TOKEN + 10. When content is a list of blocks ([{"type": "text", "text": "..."}, {"type": "image_url", ...}]), len() returned block count (~2), so every multimodal message scored ~10 tokens regardless of size. The backward walk exhausted the list before hitting the budget ceiling, the head_end safeguard forced cut = n - min_tail, and compression had nothing left to summarize.

Changes

  • agent/context_compressor.py — sum text-char lengths across list content blocks in both _find_tail_cut_by_tokens and _prune_old_tool_results, with isinstance guards for dict / str / fallback so bare-string items don't AttributeError.
  • tests/agent/test_context_compressor.py — 4 regression tests: multimodal char-sum, plain-string unchanged, image-only block, mixed list with bare strings.

Validation

Before After
Multimodal msg (500 chars + image) token estimate 10 135
10 multimodal msgs w/ budget=1000 tail=10 (all protected) tail=3 (budget respected)
tests/agent/test_context_compressor.py 52 56 (all pass)
E2E _find_tail_cut_by_tokens w/ real ContextCompressor tail=10 tail=3

briandevans and others added 3 commits April 26, 2026 21:44
… _find_tail_cut_by_tokens

_find_tail_cut_by_tokens called len(content) to estimate message tokens.
When content is a list of blocks (multimodal: text + image_url), len()
returns block count (e.g. 2) rather than character count, so a message
with 500 chars of text was counted as ~10 tokens instead of ~135.

This caused the backward walk to exhaust all messages before hitting the
budget ceiling; the head_end safeguard then forced cut = n - min_tail,
shrinking the protected tail to the bare minimum and preventing effective
compression of long multimodal conversations.

Fix mirrors the existing pattern in _prune_old_tool_results (line 487):
  sum(len(p.get("text", "")) for p in raw_content)
  if isinstance(raw_content, list) else len(raw_content)

Tests: 3 new cases in TestTokenBudgetTailProtection — regression guard
(confirms the test fails with the bug), plain-string regression guard,
and image-only block edge case.

Fixes #16087.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t list

raw_content from message["content"] can be a list that contains bare
strings, not only dicts.  The previous `p.get("text", "")` call raised
AttributeError on string items, crashing context compression for any
session that had a message with mixed content.

Guard with isinstance checks: dict → .get("text"), str → len(p),
fallback → len(str(p)).  Adds a regression test covering the bare-string
case that would have AttributeError'd on the pre-fix code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bare-string isinstance guard added in 80ae262 covered _find_tail_cut_by_tokens
(line 1084) but missed the identical pattern in _calculate_protect_tail_boundary
(line 487, the protect-tail scan loop).  Both loops call .get("text", "") on every
list item in message["content"]; both crash with AttributeError when that list
contains a bare string.

Apply the same dict/str/fallback isinstance guard to the protect-tail path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@teknium1 teknium1 merged commit bda2dbc into main Apr 27, 2026
11 of 12 checks passed
@teknium1 teknium1 deleted the hermes/hermes-435518c4 branch April 27, 2026 04:48
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 27, 2026
@teknium1

Copy link
Copy Markdown
Contributor Author

Post-merge clarification on the actual trigger:

The reporter's stated repro (send vision-capable model messages containing both text and images → compression misbehaves) does not reach this code path on current main. User-attached images are preprocessed out of history before they hit the session message list:

  • CLI: cli.py::_preprocess_images_with_vision runs every attachment through the auxiliary vision model and replaces it with [The user attached an image. Here's what it contains: <description>] plain-text.
  • Gateway: gateway/run.py around line 8410 does the same via vision_analyze_tool.
  • API server: gateway/platforms/api_server.py:102 silently strips image_url / non-text parts on ingest.

So {"type": "image_url"} content blocks never appear in the persisted message history that _find_tail_cut_by_tokens walks.

Where the miscount actually bites: list-shaped content on assistant messages from the Anthropic adapter roundtrip — thinking + text + tool_use blocks (agent/anthropic_adapter.py:1346/1383/1466/1475/1498). Before this fix, an assistant turn with N content blocks (long thinking + text) counted as N tokens instead of thousands, so the tail walk still under-protected those turns.

The fix is still correct — list-shaped content arithmetic was wrong regardless of origin — but the user-attached-image scenario from the issue is theoretical, not observable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: underestimates token count for multimodal messages, causing oversized tail protection and ineffective context compression

3 participants