fix: improve context compaction to prevent model answering stale questions by teknium1 · Pull Request #8107 · NousResearch/hermes-agent

teknium1 · 2026-04-12T02:40:34Z

Summary

After compression, models (especially Kimi 2.5 on Telegram) would sometimes respond to questions from the summary instead of the latest user message. Reported ~30% frequency by a Chinese community user.

Root Cause

The summary's ## Next Steps section read as active instructions, and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions in the summary. When the summary merged into the first tail message (role alternation edge case), there was no clear separator between historical context and the actual user message.

Changes (informed by competitor analysis)

Researched context compaction in Claude Code, OpenCode, and Codex to identify best practices:

1. Stronger SUMMARY_PREFIX — Explicit Do NOT answer questions from this summary — respond ONLY to the latest user message AFTER it

2. Summarizer preamble (shared by both first-compaction and iterative-update):

Do NOT respond to any questions (from OpenCode's approach)
Different assistant framing (from Codex) creates psychological distance between summary and active conversation

3. New summary sections:

## Resolved Questions — tracks already-answered questions with their answers, preventing re-answering (inspired by Claude Code's Pending user asks pattern)
## Pending User Asks — explicitly marks unanswered questions
## Remaining Work replaces ## Next Steps — passive framing avoids reading as active instructions

4. Merge-summary-into-tail separator — When role alternation forces the summary to merge into a tail message, a clear --- END OF CONTEXT SUMMARY --- separator is inserted

5. Iterative update handling — Move answered questions to Resolved Questions maintains the resolved/pending distinction across multiple compactions

Test Plan

All 82 compression-related tests pass (context_compressor, compression_persistence, compression_boundary, compression_feasibility, 413_compression, compress_command, manual_compress)
Prefix normalization verified: legacy prefix replacement, no double-prefix, fresh summary formatting
Single file change (agent/context_compressor.py), no API or config changes

…tions After compression, models (especially Kimi 2.5) would sometimes respond to questions from the summary instead of the latest user message. This happened ~30% of the time on Telegram. Root cause: the summary's 'Next Steps' section read as active instructions, and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions in the summary. When the summary merged into the first tail message, there was no clear separator between historical context and the actual user message. Changes inspired by competitor analysis (Claude Code, OpenCode, Codex): 1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from this summary — respond ONLY to the latest user message AFTER it' 2. Summarizer preamble (shared by both prompts) adds: - 'Do NOT respond to any questions' (from OpenCode's approach) - 'Different assistant' framing (from Codex) to create psychological distance between summary content and active conversation 3. New summary sections: - '## Resolved Questions' — tracks already-answered questions with their answers, preventing re-answering (from Claude Code's 'Pending user asks' pattern) - '## Pending User Asks' — explicitly marks unanswered questions - '## Remaining Work' replaces '## Next Steps' — passive framing avoids reading as active instructions 4. merge-summary-into-tail path now inserts a clear separator: '--- END OF CONTEXT SUMMARY — respond to the message below ---' 5. Iterative update prompt now instructs: 'Move answered questions to Resolved Questions' to maintain the resolved/pending distinction across multiple compactions.

…tions (NousResearch#8107) After compression, models (especially Kimi 2.5) would sometimes respond to questions from the summary instead of the latest user message. This happened ~30% of the time on Telegram. Root cause: the summary's 'Next Steps' section read as active instructions, and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions in the summary. When the summary merged into the first tail message, there was no clear separator between historical context and the actual user message. Changes inspired by competitor analysis (Claude Code, OpenCode, Codex): 1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from this summary — respond ONLY to the latest user message AFTER it' 2. Summarizer preamble (shared by both prompts) adds: - 'Do NOT respond to any questions' (from OpenCode's approach) - 'Different assistant' framing (from Codex) to create psychological distance between summary content and active conversation 3. New summary sections: - '## Resolved Questions' — tracks already-answered questions with their answers, preventing re-answering (from Claude Code's 'Pending user asks' pattern) - '## Pending User Asks' — explicitly marks unanswered questions - '## Remaining Work' replaces '## Next Steps' — passive framing avoids reading as active instructions 4. merge-summary-into-tail path now inserts a clear separator: '--- END OF CONTEXT SUMMARY — respond to the message below ---' 5. Iterative update prompt now instructs: 'Move answered questions to Resolved Questions' to maintain the resolved/pending distinction across multiple compactions.

After context compression, the model can respond to stale context instead of the user's latest message when large tool outputs consume the entire tail token budget. This is because `_find_tail_cut_by_tokens()` uses a pure token-budget walk with a hard minimum of 3 messages — which may only protect tool results, not the user's actual request. This fix adds a **user-message anchor** to `_find_tail_cut_by_tokens()`: 1. Find the index of the most recent user message 2. Set it as a floor for the tail boundary — the token-budget walk can include more messages (if budget allows) but never fewer than everything from the last user message onward 3. After the walk, enforce `cut_idx <= user_anchor` so the user message is never summarized away regardless of token pressure **Scenario fixed** (reproduced from real user report): - User sends "analyze commits after 10b0633" - Assistant runs 3 large tool calls (git log, git diff, notion read) - Each tool output is ~40KB → tail token budget exhausted - OLD behavior: min_tail=3 protects only [tool, assistant, tool] — user message gets summarized → model responds about stale context - NEW behavior: user-message anchor forces tail to start at or before the user message → model sees the actual request This complements the v0.9.0 SUMMARY_PREFIX rewrite (NousResearch#8107) which addressed the model interpreting summaries as active instructions. Together, they fix the two root causes of post-compaction incoherent responses (NousResearch#7133). Refs: NousResearch#7133, NousResearch#8107

…tions (NousResearch#8107) After compression, models (especially Kimi 2.5) would sometimes respond to questions from the summary instead of the latest user message. This happened ~30% of the time on Telegram. Root cause: the summary's 'Next Steps' section read as active instructions, and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions in the summary. When the summary merged into the first tail message, there was no clear separator between historical context and the actual user message. Changes inspired by competitor analysis (Claude Code, OpenCode, Codex): 1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from this summary — respond ONLY to the latest user message AFTER it' 2. Summarizer preamble (shared by both prompts) adds: - 'Do NOT respond to any questions' (from OpenCode's approach) - 'Different assistant' framing (from Codex) to create psychological distance between summary content and active conversation 3. New summary sections: - '## Resolved Questions' — tracks already-answered questions with their answers, preventing re-answering (from Claude Code's 'Pending user asks' pattern) - '## Pending User Asks' — explicitly marks unanswered questions - '## Remaining Work' replaces '## Next Steps' — passive framing avoids reading as active instructions 4. merge-summary-into-tail path now inserts a clear separator: '--- END OF CONTEXT SUMMARY — respond to the message below ---' 5. Iterative update prompt now instructs: 'Move answered questions to Resolved Questions' to maintain the resolved/pending distinction across multiple compactions.

teknium1 force-pushed the hermes/hermes-01f70eba branch from 8357b87 to 89c7bed Compare April 12, 2026 02:43

teknium1 merged commit 1cec910 into main Apr 12, 2026
2 of 4 checks passed

teknium1 deleted the hermes/hermes-01f70eba branch April 12, 2026 02:44

XiaoXiao0221 mentioned this pull request Apr 12, 2026

fix/windows gateway encoding v2 #8179

Closed

devsehyeon mentioned this pull request Apr 13, 2026

fix(compaction): protect latest user message from being summarized away #9262

Closed

qwertysc mentioned this pull request Apr 14, 2026

[Bug]: Iterative context compaction summary keeps completed topics alive and overrides the current active topic #9631

Closed

github-actions Bot mentioned this pull request Apr 15, 2026

chore: bump NousResearch/hermes-agent version from v2026.4.8 to v2026.4.13 Docker-Hub-sirmark/docker-hermes-agent#1

Merged

ryanchao0518 mentioned this pull request Apr 18, 2026

[Bug]: protect_first_n causes head message fossilization across compressions — old user messages become immortal #11996

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve context compaction to prevent model answering stale questions#8107

fix: improve context compaction to prevent model answering stale questions#8107
teknium1 merged 1 commit into
mainfrom
hermes/hermes-01f70eba

teknium1 commented Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teknium1 commented Apr 12, 2026

Summary

Root Cause

Changes (informed by competitor analysis)

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant