/compress can report higher token counts after successful compaction because the banner uses a rough transcript-only estimate

Suggested title:
`/compress` can report higher token counts after successful compaction because the banner uses a rough transcript-only estimate

## Summary

The gateway/manual `/compress` path can successfully compact a session and still report a larger token count afterward.

Observed example:

`🗜️ Compressed: 27 → 13 messages`
`~4,462 → ~4,727 tokens`

The compaction did happen: the message count dropped sharply and the compressed transcript was rewritten. But the token figure increased because the banner is not reporting real request-token pressure. It is reporting a rough estimate of only the surviving `user`/`assistant` transcript content, and the compactor can replace many short turns with one dense handoff summary.

This makes the `/compress` banner misleading even when compression succeeds.

## Actual behavior

A manual `/compress` can produce output where:

- the message count decreases substantially
- the compressed transcript materially changes
- but the reported token estimate increases

Example observed output:

`🗜️ Compressed: 27 → 13 messages`
`~4,462 → ~4,727 tokens`

In the affected session, the compressed transcript's handoff summary accounted for most of the remaining rough token count.

## Expected behavior

The `/compress` response should do one of the following:

1. Report a metric that actually reflects post-compaction request pressure, or
2. Clearly label the current metric as a rough transcript-only estimate, or
3. Warn when the summary became denser even though the session compacted successfully

It should not implicitly suggest that compaction “made things bigger” without explaining that the displayed number is a limited heuristic.

## Root cause

This is caused by a mismatch between what the banner measures and what compression is optimizing.

### 1. The `/compress` banner measures only filtered transcript content

The manual gateway handler filters history down to only `user` and `assistant` messages with content, then computes `approx_tokens` and `new_tokens` with `estimate_messages_tokens_rough()`:

- `gateway/run.py` around lines 4996-5002
- `gateway/run.py` around lines 5034-5039
- `agent/model_metadata.py` around lines 975-978

That estimator is just:

```python
total_chars = sum(len(str(msg)) for msg in messages)
return total_chars // 4
```

So the banner does **not** reflect:

- the full request payload
- tool messages
- system prompt size
- tool schema size
- provider-reported prompt tokens

It is only a rough char-count heuristic over a filtered transcript.

### 2. The compressor can replace many short turns with one dense summary

The context compressor is explicitly allowed to generate large structured handoff summaries:

- `agent/context_compressor.py` around lines 191-200
- `agent/context_compressor.py` around lines 272-285

The summary budget has:

- a floor of 2000 tokens
- a ceiling of 12000 tokens

The compressor also performs iterative summary updates, preserving the previous summary and adding new progress on later compressions.

That means compression can turn many small conversational turns into one information-dense summary containing:

- decisions
- file paths
- commands run
- errors
- next steps
- critical context

### 3. With current defaults, `27 → 13` is a valid compression outcome

Current compression behavior uses:

- `protect_first_n = 3`
- `compression.protect_last_n = 10` in my local config

The compressor can also merge the summary into the first kept tail message when role alternation would otherwise break.

That makes a result like `27 → 13` entirely plausible:

- 3 protected head messages
- 10 protected tail messages
- summary merged into the first kept tail message

So the observed message-count reduction is consistent with successful compaction.

## Minimal reproduction

I reproduced this locally by forcing a dense compaction summary into a 27-message alternating `user`/`assistant` history.

Observed local result:

- before messages: 27
- after messages: 13
- before rough tokens: 1681
- after rough tokens: 4565

In other words: fewer messages, larger rough token estimate.

I also inspected the real compressed session lineage and found that the surviving handoff summary accounted for most of the remaining rough token count in the compressed child session.

## Impact

- Misleading UX: the user sees “Compressed” but the token number goes up
- Operator confusion: it looks like compaction failed or made the context worse
- Harder debugging: the banner mixes a valid compaction result with a misleading metric
- Adjacent to #6202: even when `/compress` is not a no-op, the success banner can still mislead in a different way

## Not a duplicate of nearby issues

I checked these adjacent issues:

- `#6202` — `/compress` can report success even when the transcript is unchanged
  - Different bug: false-positive success on a no-op
  - This new issue is about a **real** compaction whose displayed token estimate still becomes misleading

- `#2599` — GLM gateway sessions can undercount request size and post-compaction reporting can describe transcript-only size instead of full request size
  - Adjacent but different scope
  - #2599 is about overflow protection and full-request estimation in GLM sessions
  - This new issue is about `/compress` UX in normal operation when dense summaries make the transcript-only heuristic increase after successful compaction

- `#499` — compaction quality overhaul
  - Enhancement request, not a bug report about misleading post-compaction metrics

## Suggested fix direction

Minimum viable fix:

1. Change the `/compress` banner text to explicitly label the number as a rough transcript-only estimate.
2. If `new_tokens > approx_tokens` while `new_count < original_count`, include an explanation such as:
   - `Compression succeeded, but the handoff summary is denser than the removed turns, so the rough transcript estimate increased.`

Stronger fix:

3. Report both:
   - transcript-only rough estimate
   - full next-request rough estimate via `estimate_request_tokens_rough()` or equivalent
4. Prefer provider-reported post-compaction prompt usage when available instead of the current transcript-only heuristic.

## Environment

- Repo: `NousResearch/hermes-agent`
- Branch observed: `main`
- Commit observed locally: `ff6a86cb529a372198b4b80d5e022e32a4a3f2cc`
- Hermes version: `Hermes Agent v0.8.0 (2026.4.8)`
- Python: `3.11.14`
- OS: `Linux 6.17.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 9 17:01:16 UTC 2026 x86_64 GNU/Linux`
- Reproduced on the gateway/manual `/compress` path and via direct local reproduction against the compressor logic

## Relevant code locations

- `gateway/run.py` — `_handle_compress_command()`
- `agent/model_metadata.py` — `estimate_messages_tokens_rough()`
- `agent/context_compressor.py` — summary budget, iterative summary updates, and summary merge behavior
- `run_agent.py` — `AIAgent` compressor defaults


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/compress can report higher token counts after successful compaction because the banner uses a rough transcript-only estimate #6217

Summary

Actual behavior

Expected behavior

Root cause

1. The `/compress` banner measures only filtered transcript content

2. The compressor can replace many short turns with one dense summary

3. With current defaults, `27 → 13` is a valid compression outcome

Minimal reproduction

Impact

Not a duplicate of nearby issues

Suggested fix direction

Environment

Relevant code locations

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

/compress can report higher token counts after successful compaction because the banner uses a rough transcript-only estimate #6217

Description

Summary

Actual behavior

Expected behavior

Root cause

1. The /compress banner measures only filtered transcript content

2. The compressor can replace many short turns with one dense summary

3. With current defaults, 27 → 13 is a valid compression outcome

Minimal reproduction

Impact

Not a duplicate of nearby issues

Suggested fix direction

Environment

Relevant code locations

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. The `/compress` banner measures only filtered transcript content

3. With current defaults, `27 → 13` is a valid compression outcome