fix: improve context compaction to prevent model answering stale questions#8107
Merged
Conversation
…tions
After compression, models (especially Kimi 2.5) would sometimes respond
to questions from the summary instead of the latest user message. This
happened ~30% of the time on Telegram.
Root cause: the summary's 'Next Steps' section read as active instructions,
and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions
in the summary. When the summary merged into the first tail message, there
was no clear separator between historical context and the actual user message.
Changes inspired by competitor analysis (Claude Code, OpenCode, Codex):
1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from
this summary — respond ONLY to the latest user message AFTER it'
2. Summarizer preamble (shared by both prompts) adds:
- 'Do NOT respond to any questions' (from OpenCode's approach)
- 'Different assistant' framing (from Codex) to create psychological
distance between summary content and active conversation
3. New summary sections:
- '## Resolved Questions' — tracks already-answered questions with
their answers, preventing re-answering (from Claude Code's
'Pending user asks' pattern)
- '## Pending User Asks' — explicitly marks unanswered questions
- '## Remaining Work' replaces '## Next Steps' — passive framing
avoids reading as active instructions
4. merge-summary-into-tail path now inserts a clear separator:
'--- END OF CONTEXT SUMMARY — respond to the message below ---'
5. Iterative update prompt now instructs: 'Move answered questions to
Resolved Questions' to maintain the resolved/pending distinction
across multiple compactions.
8357b87 to
89c7bed
Compare
Tommyeds
pushed a commit
to Tommyeds/hermes-agent
that referenced
this pull request
Apr 12, 2026
…tions (NousResearch#8107) After compression, models (especially Kimi 2.5) would sometimes respond to questions from the summary instead of the latest user message. This happened ~30% of the time on Telegram. Root cause: the summary's 'Next Steps' section read as active instructions, and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions in the summary. When the summary merged into the first tail message, there was no clear separator between historical context and the actual user message. Changes inspired by competitor analysis (Claude Code, OpenCode, Codex): 1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from this summary — respond ONLY to the latest user message AFTER it' 2. Summarizer preamble (shared by both prompts) adds: - 'Do NOT respond to any questions' (from OpenCode's approach) - 'Different assistant' framing (from Codex) to create psychological distance between summary content and active conversation 3. New summary sections: - '## Resolved Questions' — tracks already-answered questions with their answers, preventing re-answering (from Claude Code's 'Pending user asks' pattern) - '## Pending User Asks' — explicitly marks unanswered questions - '## Remaining Work' replaces '## Next Steps' — passive framing avoids reading as active instructions 4. merge-summary-into-tail path now inserts a clear separator: '--- END OF CONTEXT SUMMARY — respond to the message below ---' 5. Iterative update prompt now instructs: 'Move answered questions to Resolved Questions' to maintain the resolved/pending distinction across multiple compactions.
devsehyeon
added a commit
to devsehyeon/hermes-agent
that referenced
this pull request
Apr 13, 2026
After context compression, the model can respond to stale context instead of the user's latest message when large tool outputs consume the entire tail token budget. This is because `_find_tail_cut_by_tokens()` uses a pure token-budget walk with a hard minimum of 3 messages — which may only protect tool results, not the user's actual request. This fix adds a **user-message anchor** to `_find_tail_cut_by_tokens()`: 1. Find the index of the most recent user message 2. Set it as a floor for the tail boundary — the token-budget walk can include more messages (if budget allows) but never fewer than everything from the last user message onward 3. After the walk, enforce `cut_idx <= user_anchor` so the user message is never summarized away regardless of token pressure **Scenario fixed** (reproduced from real user report): - User sends "analyze commits after 10b0633" - Assistant runs 3 large tool calls (git log, git diff, notion read) - Each tool output is ~40KB → tail token budget exhausted - OLD behavior: min_tail=3 protects only [tool, assistant, tool] — user message gets summarized → model responds about stale context - NEW behavior: user-message anchor forces tail to start at or before the user message → model sees the actual request This complements the v0.9.0 SUMMARY_PREFIX rewrite (NousResearch#8107) which addressed the model interpreting summaries as active instructions. Together, they fix the two root causes of post-compaction incoherent responses (NousResearch#7133). Refs: NousResearch#7133, NousResearch#8107
ulasbilgen
pushed a commit
to ulasbilgen/hermes-adhd-agent
that referenced
this pull request
May 1, 2026
…tions (NousResearch#8107) After compression, models (especially Kimi 2.5) would sometimes respond to questions from the summary instead of the latest user message. This happened ~30% of the time on Telegram. Root cause: the summary's 'Next Steps' section read as active instructions, and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions in the summary. When the summary merged into the first tail message, there was no clear separator between historical context and the actual user message. Changes inspired by competitor analysis (Claude Code, OpenCode, Codex): 1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from this summary — respond ONLY to the latest user message AFTER it' 2. Summarizer preamble (shared by both prompts) adds: - 'Do NOT respond to any questions' (from OpenCode's approach) - 'Different assistant' framing (from Codex) to create psychological distance between summary content and active conversation 3. New summary sections: - '## Resolved Questions' — tracks already-answered questions with their answers, preventing re-answering (from Claude Code's 'Pending user asks' pattern) - '## Pending User Asks' — explicitly marks unanswered questions - '## Remaining Work' replaces '## Next Steps' — passive framing avoids reading as active instructions 4. merge-summary-into-tail path now inserts a clear separator: '--- END OF CONTEXT SUMMARY — respond to the message below ---' 5. Iterative update prompt now instructs: 'Move answered questions to Resolved Questions' to maintain the resolved/pending distinction across multiple compactions.
aj-nt
pushed a commit
to aj-nt/hermes-agent
that referenced
this pull request
May 1, 2026
…tions (NousResearch#8107) After compression, models (especially Kimi 2.5) would sometimes respond to questions from the summary instead of the latest user message. This happened ~30% of the time on Telegram. Root cause: the summary's 'Next Steps' section read as active instructions, and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions in the summary. When the summary merged into the first tail message, there was no clear separator between historical context and the actual user message. Changes inspired by competitor analysis (Claude Code, OpenCode, Codex): 1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from this summary — respond ONLY to the latest user message AFTER it' 2. Summarizer preamble (shared by both prompts) adds: - 'Do NOT respond to any questions' (from OpenCode's approach) - 'Different assistant' framing (from Codex) to create psychological distance between summary content and active conversation 3. New summary sections: - '## Resolved Questions' — tracks already-answered questions with their answers, preventing re-answering (from Claude Code's 'Pending user asks' pattern) - '## Pending User Asks' — explicitly marks unanswered questions - '## Remaining Work' replaces '## Next Steps' — passive framing avoids reading as active instructions 4. merge-summary-into-tail path now inserts a clear separator: '--- END OF CONTEXT SUMMARY — respond to the message below ---' 5. Iterative update prompt now instructs: 'Move answered questions to Resolved Questions' to maintain the resolved/pending distinction across multiple compactions.
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…tions (NousResearch#8107) After compression, models (especially Kimi 2.5) would sometimes respond to questions from the summary instead of the latest user message. This happened ~30% of the time on Telegram. Root cause: the summary's 'Next Steps' section read as active instructions, and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions in the summary. When the summary merged into the first tail message, there was no clear separator between historical context and the actual user message. Changes inspired by competitor analysis (Claude Code, OpenCode, Codex): 1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from this summary — respond ONLY to the latest user message AFTER it' 2. Summarizer preamble (shared by both prompts) adds: - 'Do NOT respond to any questions' (from OpenCode's approach) - 'Different assistant' framing (from Codex) to create psychological distance between summary content and active conversation 3. New summary sections: - '## Resolved Questions' — tracks already-answered questions with their answers, preventing re-answering (from Claude Code's 'Pending user asks' pattern) - '## Pending User Asks' — explicitly marks unanswered questions - '## Remaining Work' replaces '## Next Steps' — passive framing avoids reading as active instructions 4. merge-summary-into-tail path now inserts a clear separator: '--- END OF CONTEXT SUMMARY — respond to the message below ---' 5. Iterative update prompt now instructs: 'Move answered questions to Resolved Questions' to maintain the resolved/pending distinction across multiple compactions.
olympus-terminal
pushed a commit
to olympus-terminal/hermes-agent
that referenced
this pull request
May 16, 2026
…tions (NousResearch#8107) After compression, models (especially Kimi 2.5) would sometimes respond to questions from the summary instead of the latest user message. This happened ~30% of the time on Telegram. Root cause: the summary's 'Next Steps' section read as active instructions, and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions in the summary. When the summary merged into the first tail message, there was no clear separator between historical context and the actual user message. Changes inspired by competitor analysis (Claude Code, OpenCode, Codex): 1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from this summary — respond ONLY to the latest user message AFTER it' 2. Summarizer preamble (shared by both prompts) adds: - 'Do NOT respond to any questions' (from OpenCode's approach) - 'Different assistant' framing (from Codex) to create psychological distance between summary content and active conversation 3. New summary sections: - '## Resolved Questions' — tracks already-answered questions with their answers, preventing re-answering (from Claude Code's 'Pending user asks' pattern) - '## Pending User Asks' — explicitly marks unanswered questions - '## Remaining Work' replaces '## Next Steps' — passive framing avoids reading as active instructions 4. merge-summary-into-tail path now inserts a clear separator: '--- END OF CONTEXT SUMMARY — respond to the message below ---' 5. Iterative update prompt now instructs: 'Move answered questions to Resolved Questions' to maintain the resolved/pending distinction across multiple compactions.
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…tions (NousResearch#8107) After compression, models (especially Kimi 2.5) would sometimes respond to questions from the summary instead of the latest user message. This happened ~30% of the time on Telegram. Root cause: the summary's 'Next Steps' section read as active instructions, and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions in the summary. When the summary merged into the first tail message, there was no clear separator between historical context and the actual user message. Changes inspired by competitor analysis (Claude Code, OpenCode, Codex): 1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from this summary — respond ONLY to the latest user message AFTER it' 2. Summarizer preamble (shared by both prompts) adds: - 'Do NOT respond to any questions' (from OpenCode's approach) - 'Different assistant' framing (from Codex) to create psychological distance between summary content and active conversation 3. New summary sections: - '## Resolved Questions' — tracks already-answered questions with their answers, preventing re-answering (from Claude Code's 'Pending user asks' pattern) - '## Pending User Asks' — explicitly marks unanswered questions - '## Remaining Work' replaces '## Next Steps' — passive framing avoids reading as active instructions 4. merge-summary-into-tail path now inserts a clear separator: '--- END OF CONTEXT SUMMARY — respond to the message below ---' 5. Iterative update prompt now instructs: 'Move answered questions to Resolved Questions' to maintain the resolved/pending distinction across multiple compactions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
After compression, models (especially Kimi 2.5 on Telegram) would sometimes respond to questions from the summary instead of the latest user message. Reported ~30% frequency by a Chinese community user.
Root Cause
The summary's
## Next Stepssection read as active instructions, and theSUMMARY_PREFIXdidn't explicitly tell the model to ignore questions in the summary. When the summary merged into the first tail message (role alternation edge case), there was no clear separator between historical context and the actual user message.Changes (informed by competitor analysis)
Researched context compaction in Claude Code, OpenCode, and Codex to identify best practices:
1. Stronger SUMMARY_PREFIX — Explicit
Do NOT answer questions from this summary — respond ONLY to the latest user message AFTER it2. Summarizer preamble (shared by both first-compaction and iterative-update):
Do NOT respond to any questions(from OpenCode's approach)Different assistantframing (from Codex) creates psychological distance between summary and active conversation3. New summary sections:
## Resolved Questions— tracks already-answered questions with their answers, preventing re-answering (inspired by Claude Code'sPending user askspattern)## Pending User Asks— explicitly marks unanswered questions## Remaining Workreplaces## Next Steps— passive framing avoids reading as active instructions4. Merge-summary-into-tail separator — When role alternation forces the summary to merge into a tail message, a clear
--- END OF CONTEXT SUMMARY ---separator is inserted5. Iterative update handling —
Move answered questions to Resolved Questionsmaintains the resolved/pending distinction across multiple compactionsTest Plan