Skip to content

/compress can report a successful compression even when the transcript is unchanged #6202

@Jackten

Description

@Jackten

Bug report draft: /compress reports success even when no compaction occurs

Suggested title:
/compress can report a successful compression even when the transcript is unchanged

Summary

The Telegram/manual gateway /compress path can return a success banner like:

🗜️ Compressed: 19 → 19 messages
~11,726 → ~11,726 tokens

…even when no compaction actually happened.

This is a false positive. The underlying compressor can return the original message list unchanged, but the gateway still rewrites the transcript and formats a success banner as if compression succeeded.

Actual behavior

Running /compress in a gateway session can produce identical before/after message counts and token estimates, with no visible summary inserted and no history trimmed.

Example observed output:

🗜️ Compressed: 19 → 19 messages
~11,726 → ~11,726 tokens

Expected behavior

One of these should happen:

  1. Actual compaction occurs:

    • message and/or token counts decrease, or
    • a summary/handoff message is inserted and older turns are pruned.
  2. If compaction is not possible, the command should return an explicit no-op message, e.g.:

    • Nothing to compress yet.
    • No compression performed: only 19 compressible messages; current settings require more history.

It should not claim success when the transcript is unchanged.

Root cause

This appears to be a combination of two behaviors:

  1. ContextCompressor.compress() intentionally returns the input unchanged for small histories.
  2. The manual gateway /compress handler does not check whether compression actually changed anything before returning the success banner.

Why this happens

The gateway manual /compress path first filters the transcript down to only user and assistant messages:

  • gateway/run.py around lines 4996-5000
msgs = [
    {"role": m.get("role"), "content": m.get("content")}
    for m in history
    if m.get("role") in ("user", "assistant") and m.get("content")
]

Then it creates a temporary AIAgent and calls _compress_context(...) with quiet_mode=True:

  • gateway/run.py around lines 5004-5018
tmp_agent = AIAgent(
    ...,
    quiet_mode=True,
    ...,
)
compressed, _ = await loop.run_in_executor(
    None,
    lambda: tmp_agent._compress_context(msgs, "", approx_tokens=approx_tokens)
)

The compressor itself has an early return for small histories:

  • agent/context_compressor.py around lines 578-586
if n_messages <= self.protect_first_n + self.protect_last_n + 1:
    ...
    return messages

With the current defaults, that threshold is effectively:

  • protect_first_n = 3
  • protect_last_n = 20
  • so compression requires > 24 messages to do any work.

Those defaults come from run_agent.py:

  • run_agent.py around lines 1093-1097 and 1132-1136
compression_protect_last = int(_compression_cfg.get("protect_last_n", 20))
...
self.context_compressor = ContextCompressor(
    ...,
    protect_first_n=3,
    protect_last_n=compression_protect_last,
)

So a 19-message user/assistant history is not compressible by design and is returned unchanged.

However, the gateway handler unconditionally formats a success banner afterward:

  • gateway/run.py around lines 5037-5040
return (
    f"🗜️ Compressed: {original_count}{new_count} messages\n"
    f"~{approx_tokens:,} → ~{new_tokens:,} tokens"
)

There is no guard for:

  • compressed == msgs
  • new_count == original_count and new_tokens == approx_tokens
  • a no-op reason such as too_few_messages

Because quiet_mode=True, the internal warning about not being able to compress is also suppressed, so the failure mode is silent.

Minimal reproduction

This can be reproduced locally without Telegram by calling the same compression path with a 19-message alternating user/assistant history.

Example reproduction logic:

from run_agent import AIAgent
from agent.model_metadata import estimate_messages_tokens_rough

msgs = []
for i in range(19):
    role = 'user' if i % 2 == 0 else 'assistant'
    msgs.append({'role': role, 'content': f'message {i}'})

orig_tokens = estimate_messages_tokens_rough(msgs)
agent = AIAgent(
    model='openai-codex/gpt-5.4',
    quiet_mode=True,
    max_iterations=1,
    enabled_toolsets=['memory'],
    session_id='compress_bug_repro',
)
compressed, _ = agent._compress_context(msgs, '', approx_tokens=orig_tokens)

print(len(msgs), len(compressed), compressed == msgs)

Observed result:

  • before messages: 19
  • after messages: 19
  • transcript identical: True

In a real gateway /compress flow, that unchanged result still gets presented as a successful compression.

Impact

  • Misleading UX: users are told compression succeeded when nothing changed.
  • Harder debugging: no-op cases look like silent failures or “fake compression.”
  • Confusing telemetry: operators may assume compaction is working because the banner says it is.
  • Manual /compress is especially misleading because it filters to only user/assistant turns, so visible conversation length may look “long enough” while the compressible count is still below threshold.

Suggested fix direction

A good fix would be to return structured outcome metadata from compression, not just the output message list.

For example, manual /compress should be able to distinguish:

  • compressed
  • unchanged_too_few_messages
  • unchanged_boundary_collapse
  • unchanged_summary_unavailable

Minimum viable fix:

  1. After _compress_context(...), detect whether the transcript materially changed.
  2. Only show the 🗜️ Compressed: banner if the transcript changed.
  3. Otherwise return a no-op explanation with the reason and current threshold.

Optional improvement:

  • Make the manual /compress response explicitly say it is counting only compressible user/assistant messages, or reconsider whether manual compression should evaluate the full transcript for eligibility.
  • Revisit whether the default compression.protect_last_n: 20 is too conservative for manual messaging sessions. Lowering it to 10 would reduce the minimum compressible size from >24 messages to >14 messages (given the hardcoded protect_first_n=3).
  • Important nuance: lowering protect_last_n is only a tuning/workaround. It does not fix the false-success bug by itself; the command still needs explicit no-op detection.

Not a duplicate of nearby issues

I checked a few adjacent issues before drafting this:

  • #499 — open enhancement about compaction quality and prompt design, not false-success reporting on no-op manual /compress
  • #2153 — closed bug about compression failing to trigger after API disconnects, different failure mode
  • #2771 — open bug about silent memory write failures in gateway sessions, similar "not surfaced to user" pattern but unrelated subsystem

Environment

  • Repo: NousResearch/hermes-agent
  • Branch observed: main
  • Commit observed locally: ff6a86cb529a372198b4b80d5e022e32a4a3f2cc
  • Hermes version: Hermes Agent v0.8.0 (2026.4.8)
  • Python: 3.11.14
  • OS: Linux 6.17.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 9 17:01:16 UTC 2026 x86_64 GNU/Linux
  • Reproduced on the gateway/manual /compress path and in a direct local Python reproduction

Relevant code locations

  • gateway/run.py_handle_compress_command()
  • run_agent.pyAIAgent compressor defaults and _compress_context()
  • agent/context_compressor.pyContextCompressor.compress() early no-op return

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/gatewayGateway runner, session dispatch, deliverysweeper:implemented-on-mainSweeper: behavior already present on current maintype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions