fix(compressor): count tool_call envelope in tail-budget token estimate (#28053) by briandevans · Pull Request #28074 · NousResearch/hermes-agent

briandevans · 2026-05-18T15:54:56Z

What does this PR do?

Fixes the compression tail-budget walk to include the full tool_call envelope (id, type, function.name, plus JSON syntax overhead) when estimating per-message tokens. The previous estimate of content_length + len(arguments) dropped every other field on each tool_call, which for assistant turns that fan out into 3-5 parallel tool calls is the bulk of the real token cost — the issue measures 2.27× divergence on assistant messages overall and 10-15× on individual turns (4 tool_calls: 73 vs 1,090 tokens). The walks in _find_tail_cut_by_tokens and _prune_old_tool_results over-protect the tail (137 msgs kept where the budget targets ~25), and compression saves ~40% instead of ~80%. This PR introduces a shared _estimate_msg_tokens_for_budget(msg) helper, uses it at both sites, and folds in len(str(tc)) per tool_call to capture the envelope for both dict-shaped and SDK-object tool calls.

Related Issue

Fixes #28053

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

agent/context_compressor.py — extract _estimate_msg_tokens_for_budget(msg) helper. Preserves _content_length_for_budget (so the multimodal image_url accounting from [Bug]: underestimates token count for multimodal messages, causing oversized tail protection and ineffective context compression #16087 still works) and adds len(str(tc)) // _CHARS_PER_TOKEN per tool_call. _find_tail_cut_by_tokens and _prune_old_tool_results now call the shared helper.
tests/agent/test_context_compressor.py — new regression TestTokenBudgetTailProtection::test_tool_calls_envelope_counted_in_token_estimate (40-msg transcript: buggy walk protects 30 tail msgs, fixed walk protects <20) + helper unit tests (test_estimate_msg_tokens_envelope_counted, test_estimate_msg_tokens_handles_sdk_tool_calls).

How to Test

uv run --with pytest --with pytest-xdist --with pytest-asyncio python3 -m pytest tests/agent/test_context_compressor.py -v — 15/15 in TestTokenBudgetTailProtection pass.
Regression guard: temporarily revert the production change → new test fails with assert 30 < 20. Restore → passes.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run focused tests for the touched code and all pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: macOS 15.x

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Sibling Audit

The issue names _find_tail_cut_by_tokens only, but the same inline pattern lived in _prune_old_tool_results (line 681-686). Both call sites are fixed by the same helper so the invariant — token-budget walks must account for tool_call envelope — holds across the compressor. No other compressor call sites compute a per-message budget rough. No widening needed.

Builds on fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens #16369 (multimodal len(content) fix) which patched a different root cause in the same function; this PR keeps that fix intact via _content_length_for_budget.
Issue notes [Bug]: Large tool results consume entire tail token budget — conversation messages lost to summary on compression #13164 and [Bug]: underestimates token count for multimodal messages, causing oversized tail protection and ineffective context compression #16087 as related; the helper covers both since image accounting routes through the existing _content_length_for_budget path.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds more accurate token estimation for tail-budget pruning by including tool call “envelope” overhead (id/type/function.name), and introduces regression tests to prevent undercounting.

Changes:

Introduce _estimate_msg_tokens_for_budget to estimate per-message token usage including tool call metadata.
Refactor tail-pruning walks to use the shared estimator instead of counting only tool arguments.
Add tests covering envelope accounting and SDK-like tool_call objects.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
tests/agent/test_context_compressor.py	Adds regression/unit tests asserting tool_call envelopes contribute to token estimates.
agent/context_compressor.py	Adds a shared token estimator and wires it into tail-token budget walks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    raw_content = msg.get("content") or ""
+    msg_tokens = _content_length_for_budget(raw_content) // _CHARS_PER_TOKEN + 10
+    for tc in msg.get("tool_calls") or []:
+        # ``str(tc)`` captures the full envelope for both dict-shaped and
+        # SDK-object tool_calls. ``arguments`` alone misses the ~30-50
+        # chars of metadata per tc, which is the dominant blind spot when
+        # an assistant turn fans out into multiple parallel tool calls.
+        msg_tokens += len(str(tc)) // _CHARS_PER_TOKEN
+    return msg_tokens


+        Setup: 40 messages alternating ``assistant`` (3 short tool_calls each,
+        ``arguments="{}"``) and ``user`` (short text). Budget=200,
+        soft_ceiling=300.


+        messages = [
+            {"role": "user", "content": "head1"},
+            {"role": "assistant", "content": "head2"},
+        ]


+        # The added envelope is ~80 chars of JSON metadata → ≥ 15 tokens.
+        # The old args-only path would have produced diff == 0.
+        assert diff >= 15, (


BoardJames-Bot · 2026-05-18T16:04:27Z

BoardJames triage: no branch-local failure found so far. Current checks at 2026-05-18T16:04Z: Docker Build and Publish / build-arm64, build-amd64, lint, nix, supply-chain, attribution, history, and Tests / e2e are green; only Tests / test is still in progress on head 49657c79d (run 26044643860). The earlier snapshot's arm64 concern has cleared.

Changed files are limited to agent/context_compressor.py plus regression tests in tests/agent/test_context_compressor.py. Focused local validation passed in .triage/wt-28074: python -m pytest tests/agent/test_context_compressor.py::TestTokenBudgetTailProtection -q --tb=short => 15 passed. Root cause/status: currently just the shared long full-suite test job still running; no safe branch-local fix indicated unless that job later emits a concrete traceback.

`_find_tail_cut_by_tokens` and `_prune_old_tool_results` both rolled their own token estimate that counted `content_length + len(arguments)` and dropped the rest of every `tool_call`: `id`, `type`, `function.name`, and the JSON-envelope syntax around them. For an assistant turn that fans out into 3-5 short tool calls — common during data analysis, file ops, or web search — that is the bulk of the real token cost. Effect on the budget walks: - `_find_tail_cut_by_tokens` underestimates the tail, walks too far back, and protects ~2-3x more messages than the soft_ceiling intends. - `_prune_old_tool_results` does the same — older tool results stay protected and compression saves ~40% instead of the ~80% target. Issue measurement (370-message session): assistant role 14,407 vs 32,715 simple→real (2.27x), individual assistant turns 10-15x off. Compression ratio drops from ~80% expected to ~40%. Fix introduces `_estimate_msg_tokens_for_budget(msg)` that keeps the existing `_content_length_for_budget` accounting (so multimodal image parts still count via `_IMAGE_TOKEN_ESTIMATE`) and folds in `len(str(tc))` for each tool_call — capturing the full envelope for both dict-shaped and SDK-object tool calls. Both call sites switch to the new helper. Tests - new `test_tool_calls_envelope_counted_in_token_estimate` — 40-message alternating assistant/user transcript; buggy walk protects 30 tail messages, fixed walk protects < 20. - new `test_estimate_msg_tokens_envelope_counted` — direct unit test asserting id/type/function.name add ≥ 15 tokens of overhead. - new `test_estimate_msg_tokens_handles_sdk_tool_calls` — guards the non-dict (openai SDK) tool_call path the old `isinstance(tc, dict)` branch silently skipped. Existing TestTokenBudgetTailProtection (12 tests, including NousResearch#16087 multimodal regression) all pass — the fix preserves the prior char-sum behaviour for content and only adds the envelope contribution. Fixes NousResearch#28053. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

briandevans · 2026-06-01T09:13:52Z

Closing to focus the queue on security/file-safety work where civilian merges are landing. Happy to reopen if maintainers want this picked up.

Copilot AI review requested due to automatic review settings May 18, 2026 15:54

Copilot AI reviewed May 18, 2026

View reviewed changes

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels May 18, 2026

BoardJames-Bot mentioned this pull request May 18, 2026

feat(web): mobile dashboard UX polish #28127

Merged

19 tasks

briandevans force-pushed the fix/compressor-tool-calls-tokens-28053 branch 4 times, most recently from e242974 to 6cbc358 Compare May 27, 2026 10:13

briandevans force-pushed the fix/compressor-tool-calls-tokens-28053 branch from 6cbc358 to 290bf80 Compare May 29, 2026 11:13

briandevans force-pushed the fix/compressor-tool-calls-tokens-28053 branch from 290bf80 to ed11b00 Compare May 30, 2026 11:12

briandevans closed this Jun 1, 2026

basilalshukaili mentioned this pull request Jun 10, 2026

fix(compressor): count tool_call envelope in tail-budget token estimate (#28053) #43293

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(compressor): count tool_call envelope in tail-budget token estimate (#28053)#28074

fix(compressor): count tool_call envelope in tail-budget token estimate (#28053)#28074
briandevans wants to merge 1 commit into
NousResearch:mainfrom
briandevans:fix/compressor-tool-calls-tokens-28053

briandevans commented May 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

BoardJames-Bot commented May 18, 2026

Uh oh!

briandevans commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

briandevans commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Sibling Audit

Related

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

BoardJames-Bot commented May 18, 2026

Uh oh!

briandevans commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

briandevans commented May 18, 2026 •

edited

Loading