Title
[Bug]: protect_first_n causes head message fossilization across compressions — old user messages become immortal
Labels
bug, compression, P2
Body
Bug Description
protect_first_n (default 3) in context_compressor.py blindly preserves the first N messages of the session unconditionally, without checking whether their content is already covered by the compression summary or relevant to the current conversation.
Every time context compression fires:
- First N messages are copied verbatim as the new "head"
- Middle messages get summarized
- Tail messages are preserved
- A new session is created with:
[head] + [summary] + [tail]
On the next compression, the same head messages are again preserved as the head of the new session — even if they were asked hours ago, already answered, and included in the summary.
This creates "immortal messages" that persist across an entire session lineage, surviving every compression pass. In long Telegram sessions with multiple ReadTimeout-triggered compressions, the model may see a user question from 6+ hours ago as its most recent instruction.
Reproduction
- Start a Telegram session with Hermes Agent
- Send a message (e.g. "what's new in v2026.4.16?")
- Continue the conversation until context compression triggers naturally (or trigger via ReadTimeout retry)
- Observe: the first message is now part of the compressed head
- Keep chatting → another compression fires → same first message is STILL the head
- Repeat: the original message appears in every compressed session
Evidence
Gateway log confirms the original message was received once at 07:18:28:
2026-04-18 07:18:28,968 INFO gateway.run: inbound message: platform=telegram user=ryanchao chat=1092516733 msg='hermes agent v2026.4.16 版本更新與修正了什麼?'
SessionDB shows this same message appearing as the first user message across 6+ compressed sessions:
071828 (07:18) — original
111714 (11:17) — 1st compression
114823 (11:48) — 2nd compression
121142 (12:11) — 3rd compression
133124 (13:31) — 4th compression
135043 (13:50) — 5th compression
Root Cause
In agent/context_compressor.py, the compress() method:
# Line ~1040
compress_start = self.protect_first_n # default 3
The head messages (messages[:compress_start]) are copied verbatim into the compressed output without any deduplication check against the summary. Since compression creates a new session, and the gateway reloads from the transcript on the next turn, these head messages become the permanent first messages of every future session.
Why existing mitigations don't catch this
Suggested Fix
Options (not mutually exclusive):
A. Dedup head against summary content
Before preserving head messages, check if their content or semantic meaning already exists in the generated summary (e.g. matching against "Resolved Questions"). Skip or truncate fossilized messages.
B. Add staleness check
Only preserve head messages if they were sent within a recent time window (e.g. within the last N minutes or within the current session's creation time). If the session has been compressed multiple times and the head predates the oldest summary, drop it.
C. Make protect_first_n = 0 safer
Currently, setting protect_first_n: 0 means the system prompt is the only preserved context from the beginning. This is actually reasonable since the summary covers everything else. Consider lowering the default or documenting this as a recommended workaround.
D. Head content-aware preservation
Instead of preserving the first N messages by position, preserve messages that are:
- System prompts (always)
- The most recent context-switch markers (e.g.
/new, /topic)
- Messages not already covered by the summary
Environment
- Hermes Agent: v2026.4.16 (v0.10.0)
- Platform: Telegram gateway
- Trigger: ReadTimeout retries → compression
protect_first_n: 3 (default)
protect_last_n: 4 (default)
Title
[Bug]:
protect_first_ncauses head message fossilization across compressions — old user messages become immortalLabels
bug, compression, P2
Body
Bug Description
protect_first_n(default 3) incontext_compressor.pyblindly preserves the first N messages of the session unconditionally, without checking whether their content is already covered by the compression summary or relevant to the current conversation.Every time context compression fires:
[head] + [summary] + [tail]On the next compression, the same head messages are again preserved as the head of the new session — even if they were asked hours ago, already answered, and included in the summary.
This creates "immortal messages" that persist across an entire session lineage, surviving every compression pass. In long Telegram sessions with multiple ReadTimeout-triggered compressions, the model may see a user question from 6+ hours ago as its most recent instruction.
Reproduction
Evidence
Gateway log confirms the original message was received once at
07:18:28:SessionDB shows this same message appearing as the first user message across 6+ compressed sessions:
071828(07:18) — original111714(11:17) — 1st compression114823(11:48) — 2nd compression121142(12:11) — 3rd compression133124(13:31) — 4th compression135043(13:50) — 5th compressionRoot Cause
In
agent/context_compressor.py, thecompress()method:The head messages (
messages[:compress_start]) are copied verbatim into the compressed output without any deduplication check against the summary. Since compression creates a new session, and the gateway reloads from the transcript on the next turn, these head messages become the permanent first messages of every future session.Why existing mitigations don't catch this
SUMMARY_PREFIXrewrite) — fixes stale-question answering from the summary body, but the fossilized head messages are outside the summary entirely (they're real messages before the summary block)Suggested Fix
Options (not mutually exclusive):
A. Dedup head against summary content
Before preserving head messages, check if their content or semantic meaning already exists in the generated summary (e.g. matching against "Resolved Questions"). Skip or truncate fossilized messages.
B. Add staleness check
Only preserve head messages if they were sent within a recent time window (e.g. within the last N minutes or within the current session's creation time). If the session has been compressed multiple times and the head predates the oldest summary, drop it.
C. Make
protect_first_n = 0saferCurrently, setting
protect_first_n: 0means the system prompt is the only preserved context from the beginning. This is actually reasonable since the summary covers everything else. Consider lowering the default or documenting this as a recommended workaround.D. Head content-aware preservation
Instead of preserving the first N messages by position, preserve messages that are:
/new,/topic)Environment
protect_first_n: 3 (default)protect_last_n: 4 (default)