Description
Context compression summary messages are injected as regular assistant messages at the end of the visible conversation. This means the user sees walls of compressed historical summaries directly after their latest reply, making the conversation hard to read and creating the impression that the agent is hallucinating old context into the active chat.
Root cause: The context_compressor.py produces a summary string prefixed with [CONTEXT COMPACTION — REFERENCE ONLY] and inserts it as an ordinary assistant message into the message list. There is no metadata flag or role distinction that would allow a frontend (or any consumer) to distinguish a compression summary from a real assistant response.
Why Claude Code doesn't have this problem: Claude Code maintains a strict separation between the LLM context window (invisible to the user) and the rendered conversation view. Compression is a backend-only concern — the user never sees it.
Expected Behavior
Compression summaries should be:
- Invisible to the user in the rendered conversation — they are an internal mechanism for staying within the context window.
- Flagged with a metadata field (e.g.
is_compressed_summary: true or a distinct role: "compression_summary") so that frontends can filter or render them appropriately.
- Never appended after the latest user message — they should be injected before the user's current turn, or at least carry a clear visual distinction in the streaming output.
Suggested Fixes
Option A: Add metadata flag (recommended)
In agent/context_compressor.py, when inserting the summary into the message list, set a metadata field like:
message_metadata = {
"is_compressed_summary": True,
"compressed_at": timestamp,
"original_message_count": n
}
This allows any consumer (CLI, Desktop, gateway) to skip rendering it.
Option B: Use a separate role
Add a compression_summary role type to the message schema. Frontends can then filter by role.
Option C: Tool-call-style delivery
Deliver the compression summary as a tool result or system instruction rather than an assistant message, keeping it entirely out of the visible conversation flow.
Additional Context
Users of Hermes Desktop (https://github.com/fathah/hermes-desktop) are reporting the same issue from the frontend side (see fathah/hermes-desktop#537). However, the fix should be at the backend level — the agent should not emit compression summaries as ordinary assistant messages.
Description
Context compression summary messages are injected as regular assistant messages at the end of the visible conversation. This means the user sees walls of compressed historical summaries directly after their latest reply, making the conversation hard to read and creating the impression that the agent is hallucinating old context into the active chat.
Root cause: The
context_compressor.pyproduces a summary string prefixed with[CONTEXT COMPACTION — REFERENCE ONLY]and inserts it as an ordinary assistant message into the message list. There is no metadata flag or role distinction that would allow a frontend (or any consumer) to distinguish a compression summary from a real assistant response.Why Claude Code doesn't have this problem: Claude Code maintains a strict separation between the LLM context window (invisible to the user) and the rendered conversation view. Compression is a backend-only concern — the user never sees it.
Expected Behavior
Compression summaries should be:
is_compressed_summary: trueor a distinctrole: "compression_summary") so that frontends can filter or render them appropriately.Suggested Fixes
Option A: Add metadata flag (recommended)
In
agent/context_compressor.py, when inserting the summary into the message list, set a metadata field like:This allows any consumer (CLI, Desktop, gateway) to skip rendering it.
Option B: Use a separate role
Add a
compression_summaryrole type to the message schema. Frontends can then filter by role.Option C: Tool-call-style delivery
Deliver the compression summary as a tool result or system instruction rather than an assistant message, keeping it entirely out of the visible conversation flow.
Additional Context
Users of Hermes Desktop (https://github.com/fathah/hermes-desktop) are reporting the same issue from the frontend side (see fathah/hermes-desktop#537). However, the fix should be at the backend level — the agent should not emit compression summaries as ordinary assistant messages.