Environment:
- OS: Windows 10
- Hermes version: latest (May 2026)
- Mode: API Server (Gateway)
- Providers affected: google-gemini-cli, OpenAI Codex (occurs on BOTH)
- Context compression: enabled (default)
Steps to reproduce:
- Start a conversation with Hermes
- Continue long enough for context compression to trigger
- Observe: the assistant's response includes the compressed conversation summary
Expected:
Context compression should be invisible to the user — it should only be injected into the model's context window, never displayed in chat output.
Actual:
After compression, the user sees:
[new response] ← at top
[CONTEXT COMPACTION - REFERENCE ONLY]
User: old question 1
Assistant: old answer 1
... ← all old history below
Workaround: hermes config set compression.enabled false then /reset
Suspected root cause:
The SUMMARY_PREFIX in agent/context_compressor.py lines 37-51 is injected as a regular message. Either a role alternation violation or the model is outputting the compressed summary text. Since it affects multiple providers, it's not model-specific hallucination.
Environment:
Steps to reproduce:
Expected:
Context compression should be invisible to the user — it should only be injected into the model's context window, never displayed in chat output.
Actual:
After compression, the user sees:
[new response] ← at top
[CONTEXT COMPACTION - REFERENCE ONLY]
User: old question 1
Assistant: old answer 1
... ← all old history below
Workaround: hermes config set compression.enabled false then /reset
Suspected root cause:
The SUMMARY_PREFIX in agent/context_compressor.py lines 37-51 is injected as a regular message. Either a role alternation violation or the model is outputting the compressed summary text. Since it affects multiple providers, it's not model-specific hallucination.