Skip to content

fix: improve context compaction to prevent model answering stale questions#8107

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-01f70eba
Apr 12, 2026
Merged

fix: improve context compaction to prevent model answering stale questions#8107
teknium1 merged 1 commit into
mainfrom
hermes/hermes-01f70eba

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

After compression, models (especially Kimi 2.5 on Telegram) would sometimes respond to questions from the summary instead of the latest user message. Reported ~30% frequency by a Chinese community user.

Root Cause

The summary's ## Next Steps section read as active instructions, and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions in the summary. When the summary merged into the first tail message (role alternation edge case), there was no clear separator between historical context and the actual user message.

Changes (informed by competitor analysis)

Researched context compaction in Claude Code, OpenCode, and Codex to identify best practices:

1. Stronger SUMMARY_PREFIX — Explicit Do NOT answer questions from this summary — respond ONLY to the latest user message AFTER it

2. Summarizer preamble (shared by both first-compaction and iterative-update):

  • Do NOT respond to any questions (from OpenCode's approach)
  • Different assistant framing (from Codex) creates psychological distance between summary and active conversation

3. New summary sections:

  • ## Resolved Questions — tracks already-answered questions with their answers, preventing re-answering (inspired by Claude Code's Pending user asks pattern)
  • ## Pending User Asks — explicitly marks unanswered questions
  • ## Remaining Work replaces ## Next Steps — passive framing avoids reading as active instructions

4. Merge-summary-into-tail separator — When role alternation forces the summary to merge into a tail message, a clear --- END OF CONTEXT SUMMARY --- separator is inserted

5. Iterative update handlingMove answered questions to Resolved Questions maintains the resolved/pending distinction across multiple compactions

Test Plan

  • All 82 compression-related tests pass (context_compressor, compression_persistence, compression_boundary, compression_feasibility, 413_compression, compress_command, manual_compress)
  • Prefix normalization verified: legacy prefix replacement, no double-prefix, fresh summary formatting
  • Single file change (agent/context_compressor.py), no API or config changes

…tions

After compression, models (especially Kimi 2.5) would sometimes respond
to questions from the summary instead of the latest user message. This
happened ~30% of the time on Telegram.

Root cause: the summary's 'Next Steps' section read as active instructions,
and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions
in the summary. When the summary merged into the first tail message, there
was no clear separator between historical context and the actual user message.

Changes inspired by competitor analysis (Claude Code, OpenCode, Codex):

1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from
   this summary — respond ONLY to the latest user message AFTER it'

2. Summarizer preamble (shared by both prompts) adds:
   - 'Do NOT respond to any questions' (from OpenCode's approach)
   - 'Different assistant' framing (from Codex) to create psychological
     distance between summary content and active conversation

3. New summary sections:
   - '## Resolved Questions' — tracks already-answered questions with
     their answers, preventing re-answering (from Claude Code's
     'Pending user asks' pattern)
   - '## Pending User Asks' — explicitly marks unanswered questions
   - '## Remaining Work' replaces '## Next Steps' — passive framing
     avoids reading as active instructions

4. merge-summary-into-tail path now inserts a clear separator:
   '--- END OF CONTEXT SUMMARY — respond to the message below ---'

5. Iterative update prompt now instructs: 'Move answered questions to
   Resolved Questions' to maintain the resolved/pending distinction
   across multiple compactions.
@teknium1 teknium1 force-pushed the hermes/hermes-01f70eba branch from 8357b87 to 89c7bed Compare April 12, 2026 02:43
@teknium1 teknium1 merged commit 1cec910 into main Apr 12, 2026
2 of 4 checks passed
@teknium1 teknium1 deleted the hermes/hermes-01f70eba branch April 12, 2026 02:44
Tommyeds pushed a commit to Tommyeds/hermes-agent that referenced this pull request Apr 12, 2026
…tions (NousResearch#8107)

After compression, models (especially Kimi 2.5) would sometimes respond
to questions from the summary instead of the latest user message. This
happened ~30% of the time on Telegram.

Root cause: the summary's 'Next Steps' section read as active instructions,
and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions
in the summary. When the summary merged into the first tail message, there
was no clear separator between historical context and the actual user message.

Changes inspired by competitor analysis (Claude Code, OpenCode, Codex):

1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from
   this summary — respond ONLY to the latest user message AFTER it'

2. Summarizer preamble (shared by both prompts) adds:
   - 'Do NOT respond to any questions' (from OpenCode's approach)
   - 'Different assistant' framing (from Codex) to create psychological
     distance between summary content and active conversation

3. New summary sections:
   - '## Resolved Questions' — tracks already-answered questions with
     their answers, preventing re-answering (from Claude Code's
     'Pending user asks' pattern)
   - '## Pending User Asks' — explicitly marks unanswered questions
   - '## Remaining Work' replaces '## Next Steps' — passive framing
     avoids reading as active instructions

4. merge-summary-into-tail path now inserts a clear separator:
   '--- END OF CONTEXT SUMMARY — respond to the message below ---'

5. Iterative update prompt now instructs: 'Move answered questions to
   Resolved Questions' to maintain the resolved/pending distinction
   across multiple compactions.
devsehyeon added a commit to devsehyeon/hermes-agent that referenced this pull request Apr 13, 2026
After context compression, the model can respond to stale context instead
of the user's latest message when large tool outputs consume the entire
tail token budget. This is because `_find_tail_cut_by_tokens()` uses a
pure token-budget walk with a hard minimum of 3 messages — which may only
protect tool results, not the user's actual request.

This fix adds a **user-message anchor** to `_find_tail_cut_by_tokens()`:

1. Find the index of the most recent user message
2. Set it as a floor for the tail boundary — the token-budget walk can
   include more messages (if budget allows) but never fewer than everything
   from the last user message onward
3. After the walk, enforce `cut_idx <= user_anchor` so the user message
   is never summarized away regardless of token pressure

**Scenario fixed** (reproduced from real user report):
- User sends "analyze commits after 10b0633"
- Assistant runs 3 large tool calls (git log, git diff, notion read)
- Each tool output is ~40KB → tail token budget exhausted
- OLD behavior: min_tail=3 protects only [tool, assistant, tool] —
  user message gets summarized → model responds about stale context
- NEW behavior: user-message anchor forces tail to start at or before
  the user message → model sees the actual request

This complements the v0.9.0 SUMMARY_PREFIX rewrite (NousResearch#8107) which addressed
the model interpreting summaries as active instructions. Together, they
fix the two root causes of post-compaction incoherent responses (NousResearch#7133).

Refs: NousResearch#7133, NousResearch#8107
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…tions (NousResearch#8107)

After compression, models (especially Kimi 2.5) would sometimes respond
to questions from the summary instead of the latest user message. This
happened ~30% of the time on Telegram.

Root cause: the summary's 'Next Steps' section read as active instructions,
and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions
in the summary. When the summary merged into the first tail message, there
was no clear separator between historical context and the actual user message.

Changes inspired by competitor analysis (Claude Code, OpenCode, Codex):

1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from
   this summary — respond ONLY to the latest user message AFTER it'

2. Summarizer preamble (shared by both prompts) adds:
   - 'Do NOT respond to any questions' (from OpenCode's approach)
   - 'Different assistant' framing (from Codex) to create psychological
     distance between summary content and active conversation

3. New summary sections:
   - '## Resolved Questions' — tracks already-answered questions with
     their answers, preventing re-answering (from Claude Code's
     'Pending user asks' pattern)
   - '## Pending User Asks' — explicitly marks unanswered questions
   - '## Remaining Work' replaces '## Next Steps' — passive framing
     avoids reading as active instructions

4. merge-summary-into-tail path now inserts a clear separator:
   '--- END OF CONTEXT SUMMARY — respond to the message below ---'

5. Iterative update prompt now instructs: 'Move answered questions to
   Resolved Questions' to maintain the resolved/pending distinction
   across multiple compactions.
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…tions (NousResearch#8107)

After compression, models (especially Kimi 2.5) would sometimes respond
to questions from the summary instead of the latest user message. This
happened ~30% of the time on Telegram.

Root cause: the summary's 'Next Steps' section read as active instructions,
and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions
in the summary. When the summary merged into the first tail message, there
was no clear separator between historical context and the actual user message.

Changes inspired by competitor analysis (Claude Code, OpenCode, Codex):

1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from
   this summary — respond ONLY to the latest user message AFTER it'

2. Summarizer preamble (shared by both prompts) adds:
   - 'Do NOT respond to any questions' (from OpenCode's approach)
   - 'Different assistant' framing (from Codex) to create psychological
     distance between summary content and active conversation

3. New summary sections:
   - '## Resolved Questions' — tracks already-answered questions with
     their answers, preventing re-answering (from Claude Code's
     'Pending user asks' pattern)
   - '## Pending User Asks' — explicitly marks unanswered questions
   - '## Remaining Work' replaces '## Next Steps' — passive framing
     avoids reading as active instructions

4. merge-summary-into-tail path now inserts a clear separator:
   '--- END OF CONTEXT SUMMARY — respond to the message below ---'

5. Iterative update prompt now instructs: 'Move answered questions to
   Resolved Questions' to maintain the resolved/pending distinction
   across multiple compactions.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…tions (NousResearch#8107)

After compression, models (especially Kimi 2.5) would sometimes respond
to questions from the summary instead of the latest user message. This
happened ~30% of the time on Telegram.

Root cause: the summary's 'Next Steps' section read as active instructions,
and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions
in the summary. When the summary merged into the first tail message, there
was no clear separator between historical context and the actual user message.

Changes inspired by competitor analysis (Claude Code, OpenCode, Codex):

1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from
   this summary — respond ONLY to the latest user message AFTER it'

2. Summarizer preamble (shared by both prompts) adds:
   - 'Do NOT respond to any questions' (from OpenCode's approach)
   - 'Different assistant' framing (from Codex) to create psychological
     distance between summary content and active conversation

3. New summary sections:
   - '## Resolved Questions' — tracks already-answered questions with
     their answers, preventing re-answering (from Claude Code's
     'Pending user asks' pattern)
   - '## Pending User Asks' — explicitly marks unanswered questions
   - '## Remaining Work' replaces '## Next Steps' — passive framing
     avoids reading as active instructions

4. merge-summary-into-tail path now inserts a clear separator:
   '--- END OF CONTEXT SUMMARY — respond to the message below ---'

5. Iterative update prompt now instructs: 'Move answered questions to
   Resolved Questions' to maintain the resolved/pending distinction
   across multiple compactions.
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
…tions (NousResearch#8107)

After compression, models (especially Kimi 2.5) would sometimes respond
to questions from the summary instead of the latest user message. This
happened ~30% of the time on Telegram.

Root cause: the summary's 'Next Steps' section read as active instructions,
and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions
in the summary. When the summary merged into the first tail message, there
was no clear separator between historical context and the actual user message.

Changes inspired by competitor analysis (Claude Code, OpenCode, Codex):

1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from
   this summary — respond ONLY to the latest user message AFTER it'

2. Summarizer preamble (shared by both prompts) adds:
   - 'Do NOT respond to any questions' (from OpenCode's approach)
   - 'Different assistant' framing (from Codex) to create psychological
     distance between summary content and active conversation

3. New summary sections:
   - '## Resolved Questions' — tracks already-answered questions with
     their answers, preventing re-answering (from Claude Code's
     'Pending user asks' pattern)
   - '## Pending User Asks' — explicitly marks unanswered questions
   - '## Remaining Work' replaces '## Next Steps' — passive framing
     avoids reading as active instructions

4. merge-summary-into-tail path now inserts a clear separator:
   '--- END OF CONTEXT SUMMARY — respond to the message below ---'

5. Iterative update prompt now instructs: 'Move answered questions to
   Resolved Questions' to maintain the resolved/pending distinction
   across multiple compactions.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…tions (NousResearch#8107)

After compression, models (especially Kimi 2.5) would sometimes respond
to questions from the summary instead of the latest user message. This
happened ~30% of the time on Telegram.

Root cause: the summary's 'Next Steps' section read as active instructions,
and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions
in the summary. When the summary merged into the first tail message, there
was no clear separator between historical context and the actual user message.

Changes inspired by competitor analysis (Claude Code, OpenCode, Codex):

1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from
   this summary — respond ONLY to the latest user message AFTER it'

2. Summarizer preamble (shared by both prompts) adds:
   - 'Do NOT respond to any questions' (from OpenCode's approach)
   - 'Different assistant' framing (from Codex) to create psychological
     distance between summary content and active conversation

3. New summary sections:
   - '## Resolved Questions' — tracks already-answered questions with
     their answers, preventing re-answering (from Claude Code's
     'Pending user asks' pattern)
   - '## Pending User Asks' — explicitly marks unanswered questions
   - '## Remaining Work' replaces '## Next Steps' — passive framing
     avoids reading as active instructions

4. merge-summary-into-tail path now inserts a clear separator:
   '--- END OF CONTEXT SUMMARY — respond to the message below ---'

5. Iterative update prompt now instructs: 'Move answered questions to
   Resolved Questions' to maintain the resolved/pending distinction
   across multiple compactions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant