fix(context_compressor): prevent HEAD fossilization and ghost responses after compression by davidvv · Pull Request #9199 · NousResearch/hermes-agent

davidvv · 2026-04-13T19:26:37Z

Fixes two related bugs in the compression boundary algorithm, both confirmed against real production sessions. Closes #7133.

Bug 1: Last user message summarized into MIDDLE (ghost response)

_find_tail_cut_by_tokens() walks backward accumulating tokens until the budget is exhausted. When tool results are small (30–100 tokens each), many fit under the budget ceiling, pushing the cut far back and leaving the last user message in the MIDDLE region. On the next API call the model sees the summary and "continues" from it — producing a response about past history with no new user input.

Fix: after _align_boundary_backward, scan backward for the last user message and clamp cut_idx to include it unconditionally.

Bug 2: HEAD fossilization — random user message replayed after every compression

compress() always sets compress_start = protect_first_n (default 3), copying the first N messages verbatim as HEAD into every child session born from compression. This is correct when messages[0] is a system prompt. But when a session starts cold (no system message), a plain user message becomes the permanent HEAD and is re-injected into every subsequent compressed session. The model sees it as an open unanswered turn and acts on it every cycle.

Observed in production: a single user message from 07:14 AM was replayed as message[0] across 6 consecutive compression-spawned sessions throughout the day, causing the agent to repeatedly act on a stale request in the middle of unrelated work.

Fix: only apply protect_first_n when messages[0].role == "system". Otherwise compress_start = 0 — no HEAD fossilization, the tail budget handles recency.

Testing

Both fixes verified via dry-run on production sessions before deployment. The HEAD fossilization fix was confirmed by inspecting the session chain: all 6 child sessions created today started with the same user message at index 0.

…es after compression Two related bugs in the compression boundary algorithm, both confirmed against real production sessions (NousResearch#7133). ## Bug 1: Last user message summarized into MIDDLE (ghost response) _find_tail_cut_by_tokens() walks backward accumulating tokens until the budget is exhausted. When tool results are small (30-100 tokens each) many of them fit under the budget ceiling, pushing the cut far back and leaving the last user message in the MIDDLE region. On the next API call the model sees the summary and "continues" from it — producing a response about past history with no new user input (ghost response). Fix: after _align_boundary_backward, scan backward for the last user message and clamp cut_idx to include it unconditionally. ## Bug 2: HEAD fossilization — random user message replayed after every compression compress() always sets compress_start = protect_first_n (default 3), copying the first N messages verbatim as HEAD into every child session. This is correct when messages[0] is a system prompt. But when a session starts cold (no system message), a plain user message becomes the permanent HEAD and is re-injected into every session born from compression. The model sees it as an open unanswered turn and acts on it every cycle. Observed: a single user message from 07:14 AM was replayed as message[0] across 6 consecutive compression-spawned sessions throughout the day. Fix: only apply protect_first_n when messages[0].role == "system". Otherwise compress_start = 0 — no HEAD, the tail budget handles recency. Closes NousResearch#7133

Copilot

Pull request overview

Fixes two compression-boundary bugs that can cause incoherent “ghost” responses and repeated replay of stale user turns after context compression (closes #7133).

Changes:

Clamp the token-budget tail cut so the last user message is always preserved in the TAIL region.
Avoid “HEAD fossilization” by only applying protect_first_n when the transcript starts with a system message.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-13T19:31:11Z

+        if messages and messages[0].get("role") == "system":
+            compress_start = self.protect_first_n
+        else:
+            compress_start = 0
        compress_start = self._align_boundary_forward(messages, compress_start)


compress() now conditionally sets compress_start to 0 when the first message isn’t a system message, but the early-return guard above still uses self.protect_first_n to compute _min_for_compress. This can incorrectly skip compression for large/over-threshold transcripts that start with a user message (effective head size is 0, but the guard still requires protect_first_n + 4 messages). Consider basing the minimum-message check on the actual head size (system-present vs not), or move the guard to after compress_start is computed.

Copilot · 2026-04-13T19:31:11Z

+        # ------------------------------------------------------------------
+        if messages and messages[0].get("role") == "system":
+            compress_start = self.protect_first_n
+        else:
+            compress_start = 0
        compress_start = self._align_boundary_forward(messages, compress_start)



compress_start boundary selection is now more nuanced (system-only head protection), but other codepaths still duplicate the old logic. For example, gateway/run.py:5900-5903 always uses compressor.protect_first_n, which can make manual /compress previews disagree with the actual compression behavior. Consider centralizing this boundary computation in a small helper on ContextCompressor and reusing it everywhere to prevent drift.

Suggested change

# ------------------------------------------------------------------

if messages and messages[0].get("role") == "system":

compress_start = self.protect_first_n

else:

compress_start = 0

compress_start = self._align_boundary_forward(messages, compress_start)

# Keep this policy in one helper so preview and execution paths do not

# drift when the boundary logic changes.

# ------------------------------------------------------------------

def _compute_compress_start_boundary(messages_to_compress: List[Dict[str, Any]]) -> int:

if messages_to_compress and messages_to_compress[0].get("role") == "system":

boundary = self.protect_first_n

else:

boundary = 0

return self._align_boundary_forward(messages_to_compress, boundary)

compress_start = _compute_compress_start_boundary(messages)

Copilot · 2026-04-13T19:31:12Z

+        # ------------------------------------------------------------------
+        # HOTFIX: Ensure the last user message is always in the TAIL.
+        #
+        # The token-budget walk can land the cut AFTER the last user turn
+        # if tool results are small (few tokens, many fit under budget).
+        # That puts the user's question in MIDDLE → summarized → ghost.
+        # Search from the TRUE end of the message list to find the last
+        # user, then ensure the cut includes it regardless of alignment.
+        # ------------------------------------------------------------------
+        last_user_idx = None
+        for i in range(n - 1, -1, -1):
+            if messages[i].get("role") == "user":
+                last_user_idx = i
+                break
+        if last_user_idx is not None and cut_idx > last_user_idx:
+            cut_idx = last_user_idx


These fixes address two production regressions (tail cut after last user; head fossilization when no system prompt). There are existing compressor tests, but none that assert (a) last user remains in the TAIL when the transcript ends with many small tool results, or (b) compress_start becomes 0 when messages[0].role != "system" (and that the first user message doesn’t persist across successive compressions). Adding targeted regression tests for both scenarios would help prevent reintroducing these bugs.

Three issues raised in review: 1. _min_for_compress guard used self.protect_first_n unconditionally, which was too conservative for cold-start sessions (no system message) where the effective head size is 0. Fixed to use _effective_head. 2. HEAD boundary logic was inlined in compress() and duplicated in gateway/run.py preview path. Extracted into _compute_compress_start() so both paths share identical logic and cannot drift. 3. No regression tests existed for the two production bugs. Added TestHotfixRegressions with four targeted cases: - last user message lands in TAIL when transcript ends with many small tool results (Hotfix 1 / ghost response) - compress_start == 0 for cold-start sessions (Hotfix 2) - compress_start == protect_first_n when system message present - first user message absent from compressed output after cold-start Also updated two existing role-collision tests that assumed no-system-message HEAD behaviour. Both now include a system message so protect_first_n applies as originally intended, keeping their role-alternation assertions valid.

davidvv · 2026-04-14T18:53:39Z

Compression failed: 'ContextCompressor' object has no attribute '_compute_compress_start'

alt-glitch · 2026-04-27T18:07:55Z

Related to #11996 and #12092 (head message fossilization) — this PR addresses the same protect_first_n fossilization bug plus an additional ghost response bug from tail-cut boundary miscalculation.

Copilot AI review requested due to automatic review settings April 13, 2026 19:26

Copilot started reviewing on behalf of davidvv April 13, 2026 19:27 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

qwertysc mentioned this pull request Apr 14, 2026

[Bug]: Iterative context compaction summary keeps completed topics alive and overrides the current active topic #9631

Closed

alt-glitch mentioned this pull request Apr 24, 2026

fix(compressor): decay head protection across compression cycles #12092

Open

2 tasks

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(context_compressor): prevent HEAD fossilization and ghost responses after compression#9199

fix(context_compressor): prevent HEAD fossilization and ghost responses after compression#9199
davidvv wants to merge 2 commits into
NousResearch:mainfrom
davidvv:fix/context-compression-head-fossilization

davidvv commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

Copilot AI Apr 13, 2026

Uh oh!

davidvv commented Apr 14, 2026

Uh oh!

alt-glitch commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-        # ------------------------------------------------------------------
-        if messages and messages[0].get("role") == "system":
-            compress_start = self.protect_first_n
-        else:
-            compress_start = 0
-        compress_start = self._align_boundary_forward(messages, compress_start)
+        # Keep this policy in one helper so preview and execution paths do not
+        # drift when the boundary logic changes.
+        # ------------------------------------------------------------------
+        def _compute_compress_start_boundary(messages_to_compress: List[Dict[str, Any]]) -> int:
+            if messages_to_compress and messages_to_compress[0].get("role") == "system":
+                boundary = self.protect_first_n
+            else:
+                boundary = 0
+            return self._align_boundary_forward(messages_to_compress, boundary)
+        compress_start = _compute_compress_start_boundary(messages)

Conversation

davidvv commented Apr 13, 2026

Bug 1: Last user message summarized into MIDDLE (ghost response)

Bug 2: HEAD fossilization — random user message replayed after every compression

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

davidvv commented Apr 14, 2026

Uh oh!

alt-glitch commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants