fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens by teknium1 · Pull Request #16369 · NousResearch/hermes-agent

teknium1 · 2026-04-27T04:45:47Z

Multimodal messages are now sized by text-char count, not block count — so tail protection stops at the token budget instead of exhausting the message list.

Salvages #16113 by @briandevans onto current main. Closes #16087 (reported by @SimbaKingjoe). Also closes #16117 (same bug, narrower fix).

Root cause

_find_tail_cut_by_tokens called len(content) // _CHARS_PER_TOKEN + 10. When content is a list of blocks ([{"type": "text", "text": "..."}, {"type": "image_url", ...}]), len() returned block count (~2), so every multimodal message scored ~10 tokens regardless of size. The backward walk exhausted the list before hitting the budget ceiling, the head_end safeguard forced cut = n - min_tail, and compression had nothing left to summarize.

Changes

agent/context_compressor.py — sum text-char lengths across list content blocks in both _find_tail_cut_by_tokens and _prune_old_tool_results, with isinstance guards for dict / str / fallback so bare-string items don't AttributeError.
tests/agent/test_context_compressor.py — 4 regression tests: multimodal char-sum, plain-string unchanged, image-only block, mixed list with bare strings.

Validation

	Before	After
Multimodal msg (500 chars + image) token estimate	10	135
10 multimodal msgs w/ budget=1000	tail=10 (all protected)	tail=3 (budget respected)
`tests/agent/test_context_compressor.py`	52	56 (all pass)
E2E `_find_tail_cut_by_tokens` w/ real ContextCompressor	tail=10	tail=3

… _find_tail_cut_by_tokens _find_tail_cut_by_tokens called len(content) to estimate message tokens. When content is a list of blocks (multimodal: text + image_url), len() returns block count (e.g. 2) rather than character count, so a message with 500 chars of text was counted as ~10 tokens instead of ~135. This caused the backward walk to exhaust all messages before hitting the budget ceiling; the head_end safeguard then forced cut = n - min_tail, shrinking the protected tail to the bare minimum and preventing effective compression of long multimodal conversations. Fix mirrors the existing pattern in _prune_old_tool_results (line 487): sum(len(p.get("text", "")) for p in raw_content) if isinstance(raw_content, list) else len(raw_content) Tests: 3 new cases in TestTokenBudgetTailProtection — regression guard (confirms the test fails with the bug), plain-string regression guard, and image-only block edge case. Fixes #16087. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…t list raw_content from message["content"] can be a list that contains bare strings, not only dicts. The previous `p.get("text", "")` call raised AttributeError on string items, crashing context compression for any session that had a message with mixed content. Guard with isinstance checks: dict → .get("text"), str → len(p), fallback → len(str(p)). Adds a regression test covering the bare-string case that would have AttributeError'd on the pre-fix code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The bare-string isinstance guard added in 80ae262 covered _find_tail_cut_by_tokens (line 1084) but missed the identical pattern in _calculate_protect_tail_boundary (line 487, the protect-tail scan loop). Both loops call .get("text", "") on every list item in message["content"]; both crash with AttributeError when that list contains a bare string. Apply the same dict/str/fallback isinstance guard to the protect-tail path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

teknium1 · 2026-04-27T04:55:09Z

Post-merge clarification on the actual trigger:

The reporter's stated repro (send vision-capable model messages containing both text and images → compression misbehaves) does not reach this code path on current main. User-attached images are preprocessed out of history before they hit the session message list:

CLI: cli.py::_preprocess_images_with_vision runs every attachment through the auxiliary vision model and replaces it with [The user attached an image. Here's what it contains: <description>] plain-text.
Gateway: gateway/run.py around line 8410 does the same via vision_analyze_tool.
API server: gateway/platforms/api_server.py:102 silently strips image_url / non-text parts on ingest.

So {"type": "image_url"} content blocks never appear in the persisted message history that _find_tail_cut_by_tokens walks.

Where the miscount actually bites: list-shaped content on assistant messages from the Anthropic adapter roundtrip — thinking + text + tool_use blocks (agent/anthropic_adapter.py:1346/1383/1466/1475/1498). Before this fix, an assistant turn with N content blocks (long thinking + text) counted as N tokens instead of thousands, so the tail walk still under-protected those turns.

The fix is still correct — list-shaped content arithmetic was wrong regardless of origin — but the user-attached-image scenario from the issue is theoretical, not observable.

briandevans and others added 3 commits April 26, 2026 21:44

teknium1 merged commit bda2dbc into main Apr 27, 2026
11 of 12 checks passed

teknium1 deleted the hermes/hermes-435518c4 branch April 27, 2026 04:48

teknium1 mentioned this pull request Apr 27, 2026

fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens #16113

Closed

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 27, 2026

github-actions Bot mentioned this pull request May 1, 2026

chore: bump NousResearch/hermes-agent version from v2026.4.23 to v2026.4.30 Docker-Hub-sirmark/docker-hermes-agent#4

Merged

alt-glitch mentioned this pull request May 18, 2026

[Bug]: _find_tail_cut_by_tokens underestimates assistant message tokens by 2-15x — tail protection overshoots and compression becomes ineffective #28053

Open

briandevans mentioned this pull request May 18, 2026

fix(compressor): count tool_call envelope in tail-budget token estimate (#28053) #28074

Closed

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens#16369

fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens#16369
teknium1 merged 3 commits into
mainfrom
hermes/hermes-435518c4

teknium1 commented Apr 27, 2026

Uh oh!

Uh oh!

teknium1 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented Apr 27, 2026

Root cause

Changes

Validation

Uh oh!

Uh oh!

teknium1 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants