Context compression summaries injected as regular assistant messages, polluting visible conversation

## Description

Context compression summaries are injected as **ordinary assistant messages** at the end of the visible conversation. When compression triggers, users see a wall of compressed historical summaries directly after the latest assistant reply, making the conversation confusing and hard to follow.

**Root cause**: In `agent/context_compressor.py`, the compressed summary is prepended with `[CONTEXT COMPACTION — REFERENCE ONLY]` prefix and inserted as a regular assistant message into the message list. There is no metadata flag, role distinction, or message-level marker that would allow a consumer (CLI, Desktop, API client) to distinguish a compression summary from an actual assistant response.

The predecessor reference (Claude Code) handles this correctly by keeping compression entirely invisible to the user — it is a backend-only mechanism that never enters the visible conversation stream.

## Expected Behavior

Compression summaries should **not** be emitted as a visible assistant message. They are an internal memory-management mechanism.

Options for the agent to handle this:

**Option A**: Keep the summary in the context window but **do not emit it as a new message** — it should be a silent context manipulation, not part of the visible message sequence.

**Option B**: Add a metadata flag (e.g. `role: "compression_summary"` or `is_compressed_summary: true`) so frontends can filter them out. This is the minimal fix.

**Option C**: Deliver the summary via the system prompt or a hidden context field rather than as an assistant message.

## Actual Behavior

After a long conversation, the agent begins emitting messages like:

```
[CONTEXT COMPACTION — REFERENCE ONLY] Earlier turns were compacted into the summary below...
```

These appear as the latest messages in the chat, interspersed with the user's current conversation, creating a broken reading order.

## Environment

- **Hermes Agent version**: Latest main
- **Config**: Default compression settings (threshold: 0.5, target_ratio: 0.2, protect_last_n: 20)
- **Platform**: Windows (also reproducible on other platforms)

## Additional Note

This is a backend issue, not a frontend issue. The agent should not emit compression artifacts as visible assistant messages, regardless of what frontend renders them.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context compression summaries injected as regular assistant messages, polluting visible conversation #38392

Description

Expected Behavior

Actual Behavior

Environment

Additional Note

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Context compression summaries injected as regular assistant messages, polluting visible conversation #38392

Description

Description

Expected Behavior

Actual Behavior

Environment

Additional Note

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions