Context compression summary leaks into user-visible chat output

Environment:
- OS: Windows 10
- Hermes version: latest (May 2026)
- Mode: API Server (Gateway)
- Providers affected: google-gemini-cli, OpenAI Codex (occurs on BOTH)
- Context compression: enabled (default)

Steps to reproduce:
1. Start a conversation with Hermes
2. Continue long enough for context compression to trigger
3. Observe: the assistant's response includes the compressed conversation summary

Expected:
Context compression should be invisible to the user — it should only be injected into the model's context window, never displayed in chat output.

Actual:
After compression, the user sees:
  [new response]                       ← at top
  [CONTEXT COMPACTION - REFERENCE ONLY]
  User: old question 1
  Assistant: old answer 1
  ...                                 ← all old history below

Workaround: hermes config set compression.enabled false then /reset

Suspected root cause:
The SUMMARY_PREFIX in agent/context_compressor.py lines 37-51 is injected as a regular message. Either a role alternation violation or the model is outputting the compressed summary text. Since it affects multiple providers, it's not model-specific hallucination.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Context compression summary leaks into user-visible chat output #33256

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Context compression summary leaks into user-visible chat output #33256

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions