fix(context_compressor): correct multimodal content handling in token estimation and secret redaction by sontianye · Pull Request #14395 · NousResearch/hermes-agent

sontianye · 2026-04-23T05:40:03Z

Summary

Two silent bugs in agent/context_compressor.py affect all users sending multimodal (text + image) messages:

Bug 1 — Wrong token estimation (_find_tail_cut_by_tokens, ~line 1080)

len(content) on a list returns the number of list elements (1–3), not the character count. A message with 4000 chars of text was estimated at ~10 tokens instead of ~1000. Result: the tail-protection boundary is placed far too early for vision-heavy conversations, discarding far more context than intended.

Bug 2 — Secret redaction bypassed (_serialize_for_summary, ~line 603)

redact_sensitive_text(msg.get("content") or "") — a non-empty list is truthy, so or "" is skipped and redact_sensitive_text receives a list instead of a str. API keys or tokens in multimodal text parts are forwarded unredacted to the auxiliary summarisation model and persisted in the compressed summary.

Fix

Both fixes apply the same pattern already used correctly at line 478 (_prune_old_tool_results):

# Token estimation — Bug 1
raw_content = msg.get("content") or ""
content_len = (
    sum(len(p.get("text", "")) if isinstance(p, dict) else len(str(p))
        for p in raw_content)
    if isinstance(raw_content, list)
    else len(raw_content)
)
msg_tokens = content_len // _CHARS_PER_TOKEN + 10

# Redaction — Bug 2
raw_content = msg.get("content") or ""
if isinstance(raw_content, list):
    content = " ".join(
        redact_sensitive_text(p.get("text", ""))
        for p in raw_content
        if isinstance(p, dict) and p.get("type") == "text"
    )
else:
    content = redact_sensitive_text(raw_content)

Tests

Five new regression tests in TestMultimodalContent:

Token budget is exhausted by multimodal text content (not list element count)
Image-only content does not crash and costs ~zero tokens
Secrets in multimodal text parts are redacted before summary serialisation
Image-only content serialises gracefully to empty contribution
Plain string content regression guard (existing path unchanged)

Test plan

uv run pytest tests/agent/test_context_compressor.py -q — 55 passed

🤖 Generated with Claude Code

… estimation and secret redaction ## Problem Two related bugs affected users sending multimodal messages (text + images): **Bug 1 — Wrong token estimation (`_find_tail_cut_by_tokens`, line ~1080)** When a message's `content` field is a list (multimodal format), `len(content)` returns the number of list elements (typically 1–3), not the character count. A message with 4000 chars of text was estimated at ~10 tokens instead of ~1000, causing the context compression boundary to be placed far too early — retaining much less tail context than intended for vision-heavy conversations. **Bug 2 — Secret redaction bypassed (`_serialize_for_summary`, line ~603)** `redact_sensitive_text(msg.get("content") or "")` relies on a non-empty list being truthy: a list is truthy so the `or ""` fallback is skipped and `redact_sensitive_text` receives a `list` instead of a `str`. This means any API keys or tokens embedded in the text parts of multimodal messages are forwarded unredacted to the auxiliary summarisation model and persisted in the compressed summary — a silent security regression for users of vision features. ## Fix Both fixes follow the pattern already used correctly at line 478 (`_prune_old_tool_results`): ```python content_len = ( sum(len(p.get("text", "")) if isinstance(p, dict) else len(str(p)) for p in raw_content) if isinstance(raw_content, list) else len(raw_content) ) ``` For redaction, text parts are individually redacted and joined; image parts (no `"text"` key) contribute nothing to the summary text — consistent with the existing serialisation behaviour. ## Tests Five new tests in `TestMultimodalContent` cover: - Token budget is exhausted by a large multimodal text part (not the list length) - Image-only content does not crash and costs near-zero tokens - Secrets in multimodal text parts are redacted before summary serialisation - Image-only content serialises to an empty contribution without error - Plain string content regression guard Co-Authored-By: Tianye Song <songtianye1997@gmail.com>

sontianye · 2026-04-23T05:43:50Z

Closing this PR after closer analysis. Bug 1 (token estimation) is real but the impact is limited in practice. Bug 2 was overstated — redact_sensitive_text has an internal str() fallback so secrets are still redacted even when passed a list. The PR description exaggerated the security risk. Will revisit with a more carefully scoped fix if needed.

sontianye closed this Apr 23, 2026

alt-glitch added type/security Security vulnerability or hardening P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 23, 2026

alt-glitch mentioned this pull request Apr 26, 2026

[Bug]: underestimates token count for multimodal messages, causing oversized tail protection and ineffective context compression #16087

Closed

1 task

SimbaKingjoe mentioned this pull request Apr 26, 2026

Fix/underestimates token count #16117

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(context_compressor): correct multimodal content handling in token estimation and secret redaction#14395

fix(context_compressor): correct multimodal content handling in token estimation and secret redaction#14395
sontianye wants to merge 1 commit into
NousResearch:mainfrom
sontianye:fix/context-compressor-multimodal-content

sontianye commented Apr 23, 2026

Uh oh!

sontianye commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sontianye commented Apr 23, 2026

Summary

Fix

Tests

Test plan

Uh oh!

sontianye commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants