Context compression summaries injected as regular assistant messages, polluting visible conversation

## Description

Context compression summary messages are injected **as regular assistant messages at the end of the visible conversation**. This means the user sees walls of compressed historical summaries directly after their latest reply, making the conversation hard to read and creating the impression that the agent is hallucinating old context into the active chat.

**Root cause**: The `context_compressor.py` produces a summary string prefixed with `[CONTEXT COMPACTION — REFERENCE ONLY]` and inserts it as an ordinary assistant message into the message list. There is no metadata flag or role distinction that would allow a frontend (or any consumer) to distinguish a compression summary from a real assistant response.

**Why Claude Code doesn't have this problem**: Claude Code maintains a strict separation between the LLM context window (invisible to the user) and the rendered conversation view. Compression is a backend-only concern — the user never sees it.

## Expected Behavior

Compression summaries should be:

1. **Invisible to the user** in the rendered conversation — they are an internal mechanism for staying within the context window.
2. **Flagged** with a metadata field (e.g. `is_compressed_summary: true` or a distinct `role: "compression_summary"`) so that frontends can filter or render them appropriately.
3. **Never appended after the latest user message** — they should be injected before the user's current turn, or at least carry a clear visual distinction in the streaming output.

## Suggested Fixes

### Option A: Add metadata flag (recommended)

In `agent/context_compressor.py`, when inserting the summary into the message list, set a metadata field like:

```python
message_metadata = {
    "is_compressed_summary": True,
    "compressed_at": timestamp,
    "original_message_count": n
}
```

This allows any consumer (CLI, Desktop, gateway) to skip rendering it.

### Option B: Use a separate role

Add a `compression_summary` role type to the message schema. Frontends can then filter by role.

### Option C: Tool-call-style delivery

Deliver the compression summary as a tool result or system instruction rather than an assistant message, keeping it entirely out of the visible conversation flow.

## Additional Context

Users of Hermes Desktop (https://github.com/fathah/hermes-desktop) are reporting the same issue from the frontend side (see https://github.com/fathah/hermes-desktop/issues/537). However, the fix should be at the backend level — the agent should not emit compression summaries as ordinary assistant messages.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context compression summaries injected as regular assistant messages, polluting visible conversation #38389

Description

Expected Behavior

Suggested Fixes

Option A: Add metadata flag (recommended)

Option B: Use a separate role

Option C: Tool-call-style delivery

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Context compression summaries injected as regular assistant messages, polluting visible conversation #38389

Description

Description

Expected Behavior

Suggested Fixes

Option A: Add metadata flag (recommended)

Option B: Use a separate role

Option C: Tool-call-style delivery

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions