fix: prevent model from re-addressing old messages after context compaction by AlsayedHoota · Pull Request #5249 · NousResearch/hermes-agent

AlsayedHoota · 2026-04-05T16:16:00Z

Problem

After context compaction, the model frequently deviates from current work and starts re-addressing the very first user request from the session. This is a session-breaking bug that disrupts workflow.

Root Cause

protect_first_n=3 preserves the system prompt + first user message + first assistant reply verbatim in the context. After compaction, the model sees the original user request (e.g., "Build me a website") as a pending instruction and acts on it, ignoring the compaction summary and current context.

Fix

1. Change default `protect_first_n` from 3 to 1

Only the system prompt is preserved in the head. The first user/assistant exchanges are already captured in the compaction summary, so no information is lost.

Users who need the old behavior can set protect_first_n: 3 in their config.yaml under compression:.

2. Add `[HISTORICAL]` prefix to preserved head user/assistant messages

Safety net for users who configure protect_first_n > 1. Marks old messages with:

[HISTORICAL — from the start of this session, already addressed]

So the model knows they are already addressed and should not be re-acted upon.

Skips system prompts (they get the existing [Note: ...] suffix)
Skips assistant messages with tool_calls (to avoid breaking tool call integrity)
Guards with isinstance(content, str) to handle non-string content (e.g., OpenAI vision list-of-dicts)
Idempotent: does not double-prefix on re-compaction

3. Expose `protect_first_n` in config

Previously hardcoded in run_agent.py while protect_last_n was already configurable. Now reads from config.yaml:

compression:
  protect_first_n: 1  # default, system prompt only

4. Two-tier compaction failure handling

When summary generation fails (provider down, timeout, rate limit), the system now has two tiers:

Tier 1 (normal compaction at ~50% threshold): Abort compaction entirely, return original messages unchanged, and retry on the next turn. The 50% threshold provides a large buffer before the API hard-rejects, so aborting once is safe.
Tier 2 (API already rejected for context overflow): Drop middle turns WITHOUT a summary as a last resort. Losing context is bad, but crashing the conversation entirely is worse. The model keeps system prompt + recent messages and can at least continue functioning.

Tier 2 is triggered by:

context_length_exceeded errors from the API
HTTP 413 "payload too large" errors

Both error handlers in run_agent.py now pass force_truncation=True to the compressor.

Tests

10 new/updated tests in tests/agent/test_context_compressor.py:

test_default_protect_first_n_is_1 — confirms new default
test_head_user_message_gets_historical_prefix — prefix applied to head user/assistant messages
test_historical_prefix_idempotent — no double-prefix on re-compaction
test_no_historical_prefix_with_protect_first_1 — no prefix when only system prompt in head
test_tool_call_messages_not_prefixed — tool_call messages excluded
test_non_string_content_not_prefixed — vision-style list content handled safely
test_abort_compaction_on_summary_failure — Tier 1: abort behavior verified
test_force_truncation_drops_middle_on_summary_failure — Tier 2: force truncation verified
test_abort_when_no_client — no-client scenario returns messages unchanged
test_compression_increments_count — updated for new abort semantics

All 42 compressor tests pass.

Files Changed

agent/context_compressor.py — default protect_first_n=1, HISTORICAL prefix logic, isinstance type guard, two-tier failure handling with force_truncation parameter
run_agent.py — read protect_first_n from config, pass force_truncation=True in context overflow/413 error handlers
tests/agent/test_context_compressor.py — 10 new/updated tests

Copilot

Pull request overview

Fixes a session-breaking context-compaction behavior where preserved early conversation turns (especially the first user request) get re-interpreted as still-actionable after compaction, causing the model to drift away from the current task.

Changes:

Change ContextCompressor default protect_first_n from 3 to 1 (system prompt only).
Prefix preserved head user/assistant messages with a [HISTORICAL — …] marker (skipping system + tool-call assistant messages).
Make protect_first_n configurable via config.yaml and add targeted tests for the new behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`agent/context_compressor.py`	Adjusts head-protection default and adds HISTORICAL prefixing for preserved head turns to prevent re-acting on old instructions.
`run_agent.py`	Wires `compression.protect_first_n` from config into the `ContextCompressor` construction (previously hardcoded).
`tests/agent/test_context_compressor.py`	Adds tests asserting the new default and verifying HISTORICAL prefix behavior, idempotence, and tool-call exclusions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-05T16:19:59Z

+                content = msg.get("content") or ""
+                if content and not content.startswith("[HISTORICAL"):


Historical-prefix logic assumes msg["content"] is a string. If a preserved head message has non-string content (e.g., dict/list payloads from some providers), content.startswith(...) will raise, breaking compression. Consider normalizing content to a string (or guarding with isinstance(content, str)) before calling startswith and before concatenating the prefix.

Suggested change

content = msg.get("content") or ""

if content and not content.startswith("[HISTORICAL"):

content = msg.get("content")

if (

isinstance(content, str)

and content

and not content.startswith("[HISTORICAL")

):

Fixed in 8bb0619. Added isinstance(content, str) guard before the .startswith() check. Non-string content (e.g., OpenAI vision list-of-dicts) is now left unchanged. Added test_non_string_content_not_prefixed to cover this case.

Copilot · 2026-04-05T16:19:59Z

@@ -63,7 +63,7 @@ def __init__(
        self,
        model: str,
        threshold_percent: float = 0.50,
-        protect_first_n: int = 3,
+        protect_first_n: int = 1,


Changing the default protect_first_n to 1 means the initial user/assistant exchange moves into the summarized region. When summary generation fails (RuntimeError/Exception → summary is None), compress() drops all middle turns without a summary, so the initial exchange can be lost entirely. Consider a safer fallback on summary failure (e.g., return the original messages, or fall back to truncation while preserving at least the first user/assistant messages and marking them as historical).

Suggested change

protect_first_n: int = 1,

protect_first_n: int = 2,

Valid concern, but the suggested fix (reverting protect_first_n to 2+) would reintroduce the original bug — after compaction, the model sees the verbatim first user message and re-addresses it.

Instead, I've changed the fallback to abort compaction entirely when summary generation fails: return the original messages unchanged, log a warning, and retry on the next turn. The 85%→100% buffer gives room for at least one failed attempt. Added detailed comments in compress() explaining the trade-off.

Updated 3 existing tests that assumed the old truncation-on-failure behavior, and added test_abort_compaction_on_summary_failure to verify the new contract. All 41 tests pass.

Copilot · 2026-04-05T16:19:59Z

@@ -63,7 +63,7 @@ def __init__(
        self,
        model: str,
        threshold_percent: float = 0.50,
-        protect_first_n: int = 3,
+        protect_first_n: int = 1,
        protect_last_n: int = 20,


The class/module documentation still describes head protection as “system prompt + first exchange”, but the default protect_first_n is now 1 (system prompt only). Please update the surrounding docstring/comments to match the new default behavior (or make the wording config-driven).

Fixed in 8bb0619. Updated the _generate_summary docstring to reflect the new abort-on-failure contract: 'the caller should abort compaction and return the original messages unchanged to prevent data loss.'

…action After context compaction, preserved head messages (first user request + assistant reply) were kept verbatim in the context. Models would treat the original user request as a new/pending instruction and deviate from current work — a session-breaking bug. Two-part fix: 1. Change default protect_first_n from 3 to 1 Only the system prompt is preserved in the head. The first user/ assistant messages are captured in the compaction summary, so no information is lost. Users who need the old behavior can set protect_first_n=3 in config.yaml. 2. Add [HISTORICAL] prefix to any preserved head user/assistant messages Safety net for users who configure protect_first_n > 1. Marks old messages so the model knows they are already addressed. Also: expose protect_first_n in config.yaml (was previously hardcoded in run_agent.py while protect_last_n was already configurable). Includes 5 new tests covering the prefix logic, idempotency, default value, and tool_call exclusion.

Changes to context_compressor.py: - Add isinstance(content, str) guard before .startswith() in HISTORICAL prefix logic to prevent TypeError with non-string content (e.g. OpenAI vision messages that use list-of-dicts). - When summary generation fails, abort compaction entirely and return original messages unchanged (previously dropped middle turns silently). Detailed comments explain why this is safer than reverting protect_first_n. - Update _generate_summary docstring to reflect new abort-on-failure contract. Changes to test_context_compressor.py: - Update test_truncation_fallback_no_client -> test_abort_when_no_client to verify messages are returned unchanged when summary fails. - Update test_compression_increments_count to mock call_llm for a successful summary (abort-on-failure means count stays 0 without mock). - Update test_none_content_in_system_message_compress to expect unchanged message count. - Add test_non_string_content_not_prefixed: vision-style list content must not crash or get prefixed. - Add test_abort_compaction_on_summary_failure: verify abort behavior and that compression_count is not incremented.

Two-tier compaction failure handling: 1. Normal compaction (50% threshold): abort and retry next turn (safe, large buffer before API rejects). 2. API context overflow (context_length_exceeded / 413): pass force_truncation=True to drop middle turns without summary as a last resort — losing context is better than crashing the conversation. Changes: - compress() gains force_truncation parameter (default False) - _compress_context() passes it through to the compressor - Hard fallback paths (context_length_exceeded + 413 handlers) set force_truncation=True - Added test_force_truncation_drops_middle_on_summary_failure

alt-glitch · 2026-05-01T08:11:02Z

Related to #11475 (context compaction causes duplicate responses) and #14521 (old messages re-injected as new user input, 3rd occurrence). This PR addresses the root cause by changing protect_first_n default and adding [HISTORICAL] prefix.

alt-glitch · 2026-05-01T08:11:51Z

Related to #11475 and #14521.

jl1906 · 2026-05-02T06:23:48Z

Real-world reproduction (confirmed on v2026.4.23 + 880 commits)

Triggering conditions observed in a production Telegram→Hermes gateway setup:

Setup:

First user message of a long session (04:34) = "Ich habe ja noch mein MacBook Pro M3 MAX ... Welche Arbeiten können wir an ein lokales LLM auslagern?"
Session ran for ~3.5 hours before first successful compression
Auxiliary LLM was unavailable for the first compression attempt at 04:35 (no provider configured) → compression silently skipped
First successful compression triggered at 07:52 via kimi-k2.5:cloud

What happened:

Session split detected: 20260429_043443_56e049b8 → 20260429_075301_2c59f5 (compression)

New session JSONL starts:

[0] user:      "Ich habe ja noch mein MacBook Pro M3 MAX..."   ← protect_first_n HEAD
[1] user:      "Donna?"
[2] user:      "Donna?"
[3] assistant: "[CONTEXT COMPACTION — REFERENCE ONLY]..."      ← compaction marker

The three original user messages sit before the compaction marker. Agent treats them as unanswered active requests and answers the MacBook question a 3rd time, despite the compaction summary marking a completely different active task.

Propagation: Every subsequent compression re-ingested these messages into the HEAD, carrying them across 33 consecutive sessions over the course of the day. Required manual surgery on the JSONL + state.db each time the agent fired.

The aux-LLM failure angle: The initial skip due to no aux provider (the normal-mode abort path) means the conversation grew very long before first compression. By that point, the first user message was far back in the history — yet still in protect_first_n range. This scenario (delayed first compression due to aux failure) seems like a reliable trigger.

This aligns exactly with the fix in this PR. The [HISTORICAL] prefix approach would have been sufficient even before the default change, since the compaction instruction only covers content inside the summary block, not messages placed before it.

Copilot AI review requested due to automatic review settings April 5, 2026 16:16

Copilot started reviewing on behalf of AlsayedHoota April 5, 2026 16:16 View session

Copilot AI reviewed Apr 5, 2026

View reviewed changes

AlsayedHoota force-pushed the fix/compaction-head-message-replay branch from 78b22f7 to e766120 Compare April 7, 2026 08:14

AlsayedHoota added 3 commits April 8, 2026 20:01

AlsayedHoota force-pushed the fix/compaction-head-message-replay branch from e766120 to 9e3c019 Compare April 8, 2026 16:01

qwertysc mentioned this pull request Apr 14, 2026

[Bug]: Iterative context compaction summary keeps completed topics alive and overrides the current active topic #9631

Closed

alt-glitch added area/config Config system, migrations, profiles comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working labels May 1, 2026

AlsayedHoota closed this May 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent model from re-addressing old messages after context compaction#5249

fix: prevent model from re-addressing old messages after context compaction#5249
AlsayedHoota wants to merge 3 commits into
NousResearch:mainfrom
AlsayedHoota:fix/compaction-head-message-replay

AlsayedHoota commented Apr 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

AlsayedHoota Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

AlsayedHoota Apr 5, 2026

Uh oh!

Copilot AI Apr 5, 2026

Uh oh!

AlsayedHoota Apr 5, 2026

Uh oh!

alt-glitch commented May 1, 2026

Uh oh!

alt-glitch commented May 1, 2026

Uh oh!

jl1906 commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		content = msg.get("content") or ""
		if content and not content.startswith("[HISTORICAL"):

Conversation

AlsayedHoota commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Fix

1. Change default protect_first_n from 3 to 1

2. Add [HISTORICAL] prefix to preserved head user/assistant messages

3. Expose protect_first_n in config

4. Two-tier compaction failure handling

Tests

Files Changed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

AlsayedHoota Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

AlsayedHoota Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

AlsayedHoota Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

alt-glitch commented May 1, 2026

Uh oh!

alt-glitch commented May 1, 2026

Uh oh!

jl1906 commented May 2, 2026

Real-world reproduction (confirmed on v2026.4.23 + 880 commits)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AlsayedHoota commented Apr 5, 2026 •

edited

Loading

1. Change default `protect_first_n` from 3 to 1

2. Add `[HISTORICAL]` prefix to preserved head user/assistant messages

3. Expose `protect_first_n` in config