Suggested title:
/compress can report higher token counts after successful compaction because the banner uses a rough transcript-only estimate
Summary
The gateway/manual /compress path can successfully compact a session and still report a larger token count afterward.
Observed example:
🗜️ Compressed: 27 → 13 messages
~4,462 → ~4,727 tokens
The compaction did happen: the message count dropped sharply and the compressed transcript was rewritten. But the token figure increased because the banner is not reporting real request-token pressure. It is reporting a rough estimate of only the surviving user/assistant transcript content, and the compactor can replace many short turns with one dense handoff summary.
This makes the /compress banner misleading even when compression succeeds.
Actual behavior
A manual /compress can produce output where:
- the message count decreases substantially
- the compressed transcript materially changes
- but the reported token estimate increases
Example observed output:
🗜️ Compressed: 27 → 13 messages
~4,462 → ~4,727 tokens
In the affected session, the compressed transcript's handoff summary accounted for most of the remaining rough token count.
Expected behavior
The /compress response should do one of the following:
- Report a metric that actually reflects post-compaction request pressure, or
- Clearly label the current metric as a rough transcript-only estimate, or
- Warn when the summary became denser even though the session compacted successfully
It should not implicitly suggest that compaction “made things bigger” without explaining that the displayed number is a limited heuristic.
Root cause
This is caused by a mismatch between what the banner measures and what compression is optimizing.
1. The /compress banner measures only filtered transcript content
The manual gateway handler filters history down to only user and assistant messages with content, then computes approx_tokens and new_tokens with estimate_messages_tokens_rough():
gateway/run.py around lines 4996-5002
gateway/run.py around lines 5034-5039
agent/model_metadata.py around lines 975-978
That estimator is just:
total_chars = sum(len(str(msg)) for msg in messages)
return total_chars // 4
So the banner does not reflect:
- the full request payload
- tool messages
- system prompt size
- tool schema size
- provider-reported prompt tokens
It is only a rough char-count heuristic over a filtered transcript.
2. The compressor can replace many short turns with one dense summary
The context compressor is explicitly allowed to generate large structured handoff summaries:
agent/context_compressor.py around lines 191-200
agent/context_compressor.py around lines 272-285
The summary budget has:
- a floor of 2000 tokens
- a ceiling of 12000 tokens
The compressor also performs iterative summary updates, preserving the previous summary and adding new progress on later compressions.
That means compression can turn many small conversational turns into one information-dense summary containing:
- decisions
- file paths
- commands run
- errors
- next steps
- critical context
3. With current defaults, 27 → 13 is a valid compression outcome
Current compression behavior uses:
protect_first_n = 3
compression.protect_last_n = 10 in my local config
The compressor can also merge the summary into the first kept tail message when role alternation would otherwise break.
That makes a result like 27 → 13 entirely plausible:
- 3 protected head messages
- 10 protected tail messages
- summary merged into the first kept tail message
So the observed message-count reduction is consistent with successful compaction.
Minimal reproduction
I reproduced this locally by forcing a dense compaction summary into a 27-message alternating user/assistant history.
Observed local result:
- before messages: 27
- after messages: 13
- before rough tokens: 1681
- after rough tokens: 4565
In other words: fewer messages, larger rough token estimate.
I also inspected the real compressed session lineage and found that the surviving handoff summary accounted for most of the remaining rough token count in the compressed child session.
Impact
Not a duplicate of nearby issues
I checked these adjacent issues:
-
#6202 — /compress can report success even when the transcript is unchanged
- Different bug: false-positive success on a no-op
- This new issue is about a real compaction whose displayed token estimate still becomes misleading
-
#2599 — GLM gateway sessions can undercount request size and post-compaction reporting can describe transcript-only size instead of full request size
-
#499 — compaction quality overhaul
- Enhancement request, not a bug report about misleading post-compaction metrics
Suggested fix direction
Minimum viable fix:
- Change the
/compress banner text to explicitly label the number as a rough transcript-only estimate.
- If
new_tokens > approx_tokens while new_count < original_count, include an explanation such as:
Compression succeeded, but the handoff summary is denser than the removed turns, so the rough transcript estimate increased.
Stronger fix:
- Report both:
- transcript-only rough estimate
- full next-request rough estimate via
estimate_request_tokens_rough() or equivalent
- Prefer provider-reported post-compaction prompt usage when available instead of the current transcript-only heuristic.
Environment
- Repo:
NousResearch/hermes-agent
- Branch observed:
main
- Commit observed locally:
ff6a86cb529a372198b4b80d5e022e32a4a3f2cc
- Hermes version:
Hermes Agent v0.8.0 (2026.4.8)
- Python:
3.11.14
- OS:
Linux 6.17.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 9 17:01:16 UTC 2026 x86_64 GNU/Linux
- Reproduced on the gateway/manual
/compress path and via direct local reproduction against the compressor logic
Relevant code locations
gateway/run.py — _handle_compress_command()
agent/model_metadata.py — estimate_messages_tokens_rough()
agent/context_compressor.py — summary budget, iterative summary updates, and summary merge behavior
run_agent.py — AIAgent compressor defaults
Suggested title:
/compresscan report higher token counts after successful compaction because the banner uses a rough transcript-only estimateSummary
The gateway/manual
/compresspath can successfully compact a session and still report a larger token count afterward.Observed example:
🗜️ Compressed: 27 → 13 messages~4,462 → ~4,727 tokensThe compaction did happen: the message count dropped sharply and the compressed transcript was rewritten. But the token figure increased because the banner is not reporting real request-token pressure. It is reporting a rough estimate of only the surviving
user/assistanttranscript content, and the compactor can replace many short turns with one dense handoff summary.This makes the
/compressbanner misleading even when compression succeeds.Actual behavior
A manual
/compresscan produce output where:Example observed output:
🗜️ Compressed: 27 → 13 messages~4,462 → ~4,727 tokensIn the affected session, the compressed transcript's handoff summary accounted for most of the remaining rough token count.
Expected behavior
The
/compressresponse should do one of the following:It should not implicitly suggest that compaction “made things bigger” without explaining that the displayed number is a limited heuristic.
Root cause
This is caused by a mismatch between what the banner measures and what compression is optimizing.
1. The
/compressbanner measures only filtered transcript contentThe manual gateway handler filters history down to only
userandassistantmessages with content, then computesapprox_tokensandnew_tokenswithestimate_messages_tokens_rough():gateway/run.pyaround lines 4996-5002gateway/run.pyaround lines 5034-5039agent/model_metadata.pyaround lines 975-978That estimator is just:
So the banner does not reflect:
It is only a rough char-count heuristic over a filtered transcript.
2. The compressor can replace many short turns with one dense summary
The context compressor is explicitly allowed to generate large structured handoff summaries:
agent/context_compressor.pyaround lines 191-200agent/context_compressor.pyaround lines 272-285The summary budget has:
The compressor also performs iterative summary updates, preserving the previous summary and adding new progress on later compressions.
That means compression can turn many small conversational turns into one information-dense summary containing:
3. With current defaults,
27 → 13is a valid compression outcomeCurrent compression behavior uses:
protect_first_n = 3compression.protect_last_n = 10in my local configThe compressor can also merge the summary into the first kept tail message when role alternation would otherwise break.
That makes a result like
27 → 13entirely plausible:So the observed message-count reduction is consistent with successful compaction.
Minimal reproduction
I reproduced this locally by forcing a dense compaction summary into a 27-message alternating
user/assistanthistory.Observed local result:
In other words: fewer messages, larger rough token estimate.
I also inspected the real compressed session lineage and found that the surviving handoff summary accounted for most of the remaining rough token count in the compressed child session.
Impact
/compressis not a no-op, the success banner can still mislead in a different wayNot a duplicate of nearby issues
I checked these adjacent issues:
#6202—/compresscan report success even when the transcript is unchanged#2599— GLM gateway sessions can undercount request size and post-compaction reporting can describe transcript-only size instead of full request size/compressUX in normal operation when dense summaries make the transcript-only heuristic increase after successful compaction#499— compaction quality overhaulSuggested fix direction
Minimum viable fix:
/compressbanner text to explicitly label the number as a rough transcript-only estimate.new_tokens > approx_tokenswhilenew_count < original_count, include an explanation such as:Compression succeeded, but the handoff summary is denser than the removed turns, so the rough transcript estimate increased.Stronger fix:
estimate_request_tokens_rough()or equivalentEnvironment
NousResearch/hermes-agentmainff6a86cb529a372198b4b80d5e022e32a4a3f2ccHermes Agent v0.8.0 (2026.4.8)3.11.14Linux 6.17.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 9 17:01:16 UTC 2026 x86_64 GNU/Linux/compresspath and via direct local reproduction against the compressor logicRelevant code locations
gateway/run.py—_handle_compress_command()agent/model_metadata.py—estimate_messages_tokens_rough()agent/context_compressor.py— summary budget, iterative summary updates, and summary merge behaviorrun_agent.py—AIAgentcompressor defaults