Skip to content

/compress can report higher token counts after successful compaction because the banner uses a rough transcript-only estimate #6217

@Jackten

Description

@Jackten

Suggested title:
/compress can report higher token counts after successful compaction because the banner uses a rough transcript-only estimate

Summary

The gateway/manual /compress path can successfully compact a session and still report a larger token count afterward.

Observed example:

🗜️ Compressed: 27 → 13 messages
~4,462 → ~4,727 tokens

The compaction did happen: the message count dropped sharply and the compressed transcript was rewritten. But the token figure increased because the banner is not reporting real request-token pressure. It is reporting a rough estimate of only the surviving user/assistant transcript content, and the compactor can replace many short turns with one dense handoff summary.

This makes the /compress banner misleading even when compression succeeds.

Actual behavior

A manual /compress can produce output where:

  • the message count decreases substantially
  • the compressed transcript materially changes
  • but the reported token estimate increases

Example observed output:

🗜️ Compressed: 27 → 13 messages
~4,462 → ~4,727 tokens

In the affected session, the compressed transcript's handoff summary accounted for most of the remaining rough token count.

Expected behavior

The /compress response should do one of the following:

  1. Report a metric that actually reflects post-compaction request pressure, or
  2. Clearly label the current metric as a rough transcript-only estimate, or
  3. Warn when the summary became denser even though the session compacted successfully

It should not implicitly suggest that compaction “made things bigger” without explaining that the displayed number is a limited heuristic.

Root cause

This is caused by a mismatch between what the banner measures and what compression is optimizing.

1. The /compress banner measures only filtered transcript content

The manual gateway handler filters history down to only user and assistant messages with content, then computes approx_tokens and new_tokens with estimate_messages_tokens_rough():

  • gateway/run.py around lines 4996-5002
  • gateway/run.py around lines 5034-5039
  • agent/model_metadata.py around lines 975-978

That estimator is just:

total_chars = sum(len(str(msg)) for msg in messages)
return total_chars // 4

So the banner does not reflect:

  • the full request payload
  • tool messages
  • system prompt size
  • tool schema size
  • provider-reported prompt tokens

It is only a rough char-count heuristic over a filtered transcript.

2. The compressor can replace many short turns with one dense summary

The context compressor is explicitly allowed to generate large structured handoff summaries:

  • agent/context_compressor.py around lines 191-200
  • agent/context_compressor.py around lines 272-285

The summary budget has:

  • a floor of 2000 tokens
  • a ceiling of 12000 tokens

The compressor also performs iterative summary updates, preserving the previous summary and adding new progress on later compressions.

That means compression can turn many small conversational turns into one information-dense summary containing:

  • decisions
  • file paths
  • commands run
  • errors
  • next steps
  • critical context

3. With current defaults, 27 → 13 is a valid compression outcome

Current compression behavior uses:

  • protect_first_n = 3
  • compression.protect_last_n = 10 in my local config

The compressor can also merge the summary into the first kept tail message when role alternation would otherwise break.

That makes a result like 27 → 13 entirely plausible:

  • 3 protected head messages
  • 10 protected tail messages
  • summary merged into the first kept tail message

So the observed message-count reduction is consistent with successful compaction.

Minimal reproduction

I reproduced this locally by forcing a dense compaction summary into a 27-message alternating user/assistant history.

Observed local result:

  • before messages: 27
  • after messages: 13
  • before rough tokens: 1681
  • after rough tokens: 4565

In other words: fewer messages, larger rough token estimate.

I also inspected the real compressed session lineage and found that the surviving handoff summary accounted for most of the remaining rough token count in the compressed child session.

Impact

Not a duplicate of nearby issues

I checked these adjacent issues:

  • #6202/compress can report success even when the transcript is unchanged

    • Different bug: false-positive success on a no-op
    • This new issue is about a real compaction whose displayed token estimate still becomes misleading
  • #2599 — GLM gateway sessions can undercount request size and post-compaction reporting can describe transcript-only size instead of full request size

  • #499 — compaction quality overhaul

    • Enhancement request, not a bug report about misleading post-compaction metrics

Suggested fix direction

Minimum viable fix:

  1. Change the /compress banner text to explicitly label the number as a rough transcript-only estimate.
  2. If new_tokens > approx_tokens while new_count < original_count, include an explanation such as:
    • Compression succeeded, but the handoff summary is denser than the removed turns, so the rough transcript estimate increased.

Stronger fix:

  1. Report both:
    • transcript-only rough estimate
    • full next-request rough estimate via estimate_request_tokens_rough() or equivalent
  2. Prefer provider-reported post-compaction prompt usage when available instead of the current transcript-only heuristic.

Environment

  • Repo: NousResearch/hermes-agent
  • Branch observed: main
  • Commit observed locally: ff6a86cb529a372198b4b80d5e022e32a4a3f2cc
  • Hermes version: Hermes Agent v0.8.0 (2026.4.8)
  • Python: 3.11.14
  • OS: Linux 6.17.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 9 17:01:16 UTC 2026 x86_64 GNU/Linux
  • Reproduced on the gateway/manual /compress path and via direct local reproduction against the compressor logic

Relevant code locations

  • gateway/run.py_handle_compress_command()
  • agent/model_metadata.pyestimate_messages_tokens_rough()
  • agent/context_compressor.py — summary budget, iterative summary updates, and summary merge behavior
  • run_agent.pyAIAgent compressor defaults

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/gatewayGateway runner, session dispatch, deliverytype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions