Skip to content

Context compression summaries injected as regular assistant messages, polluting visible conversation #38389

@redfireblade

Description

@redfireblade

Description

Context compression summary messages are injected as regular assistant messages at the end of the visible conversation. This means the user sees walls of compressed historical summaries directly after their latest reply, making the conversation hard to read and creating the impression that the agent is hallucinating old context into the active chat.

Root cause: The context_compressor.py produces a summary string prefixed with [CONTEXT COMPACTION — REFERENCE ONLY] and inserts it as an ordinary assistant message into the message list. There is no metadata flag or role distinction that would allow a frontend (or any consumer) to distinguish a compression summary from a real assistant response.

Why Claude Code doesn't have this problem: Claude Code maintains a strict separation between the LLM context window (invisible to the user) and the rendered conversation view. Compression is a backend-only concern — the user never sees it.

Expected Behavior

Compression summaries should be:

  1. Invisible to the user in the rendered conversation — they are an internal mechanism for staying within the context window.
  2. Flagged with a metadata field (e.g. is_compressed_summary: true or a distinct role: "compression_summary") so that frontends can filter or render them appropriately.
  3. Never appended after the latest user message — they should be injected before the user's current turn, or at least carry a clear visual distinction in the streaming output.

Suggested Fixes

Option A: Add metadata flag (recommended)

In agent/context_compressor.py, when inserting the summary into the message list, set a metadata field like:

message_metadata = {
    "is_compressed_summary": True,
    "compressed_at": timestamp,
    "original_message_count": n
}

This allows any consumer (CLI, Desktop, gateway) to skip rendering it.

Option B: Use a separate role

Add a compression_summary role type to the message schema. Frontends can then filter by role.

Option C: Tool-call-style delivery

Deliver the compression summary as a tool result or system instruction rather than an assistant message, keeping it entirely out of the visible conversation flow.

Additional Context

Users of Hermes Desktop (https://github.com/fathah/hermes-desktop) are reporting the same issue from the frontend side (see fathah/hermes-desktop#537). However, the fix should be at the backend level — the agent should not emit compression summaries as ordinary assistant messages.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildertool/memoryMemory tool and memory providerstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions