Description
Context compression summaries are injected as ordinary assistant messages at the end of the visible conversation. When compression triggers, users see a wall of compressed historical summaries directly after the latest assistant reply, making the conversation confusing and hard to follow.
Root cause: In agent/context_compressor.py, the compressed summary is prepended with [CONTEXT COMPACTION — REFERENCE ONLY] prefix and inserted as a regular assistant message into the message list. There is no metadata flag, role distinction, or message-level marker that would allow a consumer (CLI, Desktop, API client) to distinguish a compression summary from an actual assistant response.
The predecessor reference (Claude Code) handles this correctly by keeping compression entirely invisible to the user — it is a backend-only mechanism that never enters the visible conversation stream.
Expected Behavior
Compression summaries should not be emitted as a visible assistant message. They are an internal memory-management mechanism.
Options for the agent to handle this:
Option A: Keep the summary in the context window but do not emit it as a new message — it should be a silent context manipulation, not part of the visible message sequence.
Option B: Add a metadata flag (e.g. role: "compression_summary" or is_compressed_summary: true) so frontends can filter them out. This is the minimal fix.
Option C: Deliver the summary via the system prompt or a hidden context field rather than as an assistant message.
Actual Behavior
After a long conversation, the agent begins emitting messages like:
[CONTEXT COMPACTION — REFERENCE ONLY] Earlier turns were compacted into the summary below...
These appear as the latest messages in the chat, interspersed with the user's current conversation, creating a broken reading order.
Environment
- Hermes Agent version: Latest main
- Config: Default compression settings (threshold: 0.5, target_ratio: 0.2, protect_last_n: 20)
- Platform: Windows (also reproducible on other platforms)
Additional Note
This is a backend issue, not a frontend issue. The agent should not emit compression artifacts as visible assistant messages, regardless of what frontend renders them.
Description
Context compression summaries are injected as ordinary assistant messages at the end of the visible conversation. When compression triggers, users see a wall of compressed historical summaries directly after the latest assistant reply, making the conversation confusing and hard to follow.
Root cause: In
agent/context_compressor.py, the compressed summary is prepended with[CONTEXT COMPACTION — REFERENCE ONLY]prefix and inserted as a regular assistant message into the message list. There is no metadata flag, role distinction, or message-level marker that would allow a consumer (CLI, Desktop, API client) to distinguish a compression summary from an actual assistant response.The predecessor reference (Claude Code) handles this correctly by keeping compression entirely invisible to the user — it is a backend-only mechanism that never enters the visible conversation stream.
Expected Behavior
Compression summaries should not be emitted as a visible assistant message. They are an internal memory-management mechanism.
Options for the agent to handle this:
Option A: Keep the summary in the context window but do not emit it as a new message — it should be a silent context manipulation, not part of the visible message sequence.
Option B: Add a metadata flag (e.g.
role: "compression_summary"oris_compressed_summary: true) so frontends can filter them out. This is the minimal fix.Option C: Deliver the summary via the system prompt or a hidden context field rather than as an assistant message.
Actual Behavior
After a long conversation, the agent begins emitting messages like:
These appear as the latest messages in the chat, interspersed with the user's current conversation, creating a broken reading order.
Environment
Additional Note
This is a backend issue, not a frontend issue. The agent should not emit compression artifacts as visible assistant messages, regardless of what frontend renders them.