Skip to content

fix(context_compressor): prevent HEAD fossilization and ghost responses after compression#9199

Open
davidvv wants to merge 2 commits into
NousResearch:mainfrom
davidvv:fix/context-compression-head-fossilization
Open

fix(context_compressor): prevent HEAD fossilization and ghost responses after compression#9199
davidvv wants to merge 2 commits into
NousResearch:mainfrom
davidvv:fix/context-compression-head-fossilization

Conversation

@davidvv

@davidvv davidvv commented Apr 13, 2026

Copy link
Copy Markdown

Fixes two related bugs in the compression boundary algorithm, both confirmed against real production sessions. Closes #7133.

Bug 1: Last user message summarized into MIDDLE (ghost response)

_find_tail_cut_by_tokens() walks backward accumulating tokens until the budget is exhausted. When tool results are small (30–100 tokens each), many fit under the budget ceiling, pushing the cut far back and leaving the last user message in the MIDDLE region. On the next API call the model sees the summary and "continues" from it — producing a response about past history with no new user input.

Fix: after _align_boundary_backward, scan backward for the last user message and clamp cut_idx to include it unconditionally.

Bug 2: HEAD fossilization — random user message replayed after every compression

compress() always sets compress_start = protect_first_n (default 3), copying the first N messages verbatim as HEAD into every child session born from compression. This is correct when messages[0] is a system prompt. But when a session starts cold (no system message), a plain user message becomes the permanent HEAD and is re-injected into every subsequent compressed session. The model sees it as an open unanswered turn and acts on it every cycle.

Observed in production: a single user message from 07:14 AM was replayed as message[0] across 6 consecutive compression-spawned sessions throughout the day, causing the agent to repeatedly act on a stale request in the middle of unrelated work.

Fix: only apply protect_first_n when messages[0].role == "system". Otherwise compress_start = 0 — no HEAD fossilization, the tail budget handles recency.

Testing

Both fixes verified via dry-run on production sessions before deployment. The HEAD fossilization fix was confirmed by inspecting the session chain: all 6 child sessions created today started with the same user message at index 0.

…es after compression

Two related bugs in the compression boundary algorithm, both confirmed
against real production sessions (NousResearch#7133).

## Bug 1: Last user message summarized into MIDDLE (ghost response)

_find_tail_cut_by_tokens() walks backward accumulating tokens until the
budget is exhausted.  When tool results are small (30-100 tokens each)
many of them fit under the budget ceiling, pushing the cut far back and
leaving the last user message in the MIDDLE region.  On the next API
call the model sees the summary and "continues" from it — producing a
response about past history with no new user input (ghost response).

Fix: after _align_boundary_backward, scan backward for the last user
message and clamp cut_idx to include it unconditionally.

## Bug 2: HEAD fossilization — random user message replayed after every compression

compress() always sets compress_start = protect_first_n (default 3),
copying the first N messages verbatim as HEAD into every child session.
This is correct when messages[0] is a system prompt.  But when a session
starts cold (no system message), a plain user message becomes the
permanent HEAD and is re-injected into every session born from
compression.  The model sees it as an open unanswered turn and acts on
it every cycle.

Observed: a single user message from 07:14 AM was replayed as message[0]
across 6 consecutive compression-spawned sessions throughout the day.

Fix: only apply protect_first_n when messages[0].role == "system".
Otherwise compress_start = 0 — no HEAD, the tail budget handles recency.

Closes NousResearch#7133
Copilot AI review requested due to automatic review settings April 13, 2026 19:26

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes two compression-boundary bugs that can cause incoherent “ghost” responses and repeated replay of stale user turns after context compression (closes #7133).

Changes:

  • Clamp the token-budget tail cut so the last user message is always preserved in the TAIL region.
  • Avoid “HEAD fossilization” by only applying protect_first_n when the transcript starts with a system message.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agent/context_compressor.py Outdated
Comment on lines 733 to 737
if messages and messages[0].get("role") == "system":
compress_start = self.protect_first_n
else:
compress_start = 0
compress_start = self._align_boundary_forward(messages, compress_start)

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compress() now conditionally sets compress_start to 0 when the first message isn’t a system message, but the early-return guard above still uses self.protect_first_n to compute _min_for_compress. This can incorrectly skip compression for large/over-threshold transcripts that start with a user message (effective head size is 0, but the guard still requires protect_first_n + 4 messages). Consider basing the minimum-message check on the actual head size (system-present vs not), or move the guard to after compress_start is computed.

Copilot uses AI. Check for mistakes.
Comment thread agent/context_compressor.py Outdated
Comment on lines 732 to 738
# ------------------------------------------------------------------
if messages and messages[0].get("role") == "system":
compress_start = self.protect_first_n
else:
compress_start = 0
compress_start = self._align_boundary_forward(messages, compress_start)

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compress_start boundary selection is now more nuanced (system-only head protection), but other codepaths still duplicate the old logic. For example, gateway/run.py:5900-5903 always uses compressor.protect_first_n, which can make manual /compress previews disagree with the actual compression behavior. Consider centralizing this boundary computation in a small helper on ContextCompressor and reusing it everywhere to prevent drift.

Suggested change
# ------------------------------------------------------------------
if messages and messages[0].get("role") == "system":
compress_start = self.protect_first_n
else:
compress_start = 0
compress_start = self._align_boundary_forward(messages, compress_start)
# Keep this policy in one helper so preview and execution paths do not
# drift when the boundary logic changes.
# ------------------------------------------------------------------
def _compute_compress_start_boundary(messages_to_compress: List[Dict[str, Any]]) -> int:
if messages_to_compress and messages_to_compress[0].get("role") == "system":
boundary = self.protect_first_n
else:
boundary = 0
return self._align_boundary_forward(messages_to_compress, boundary)
compress_start = _compute_compress_start_boundary(messages)

Copilot uses AI. Check for mistakes.
Comment on lines +660 to +675
# ------------------------------------------------------------------
# HOTFIX: Ensure the last user message is always in the TAIL.
#
# The token-budget walk can land the cut AFTER the last user turn
# if tool results are small (few tokens, many fit under budget).
# That puts the user's question in MIDDLE → summarized → ghost.
# Search from the TRUE end of the message list to find the last
# user, then ensure the cut includes it regardless of alignment.
# ------------------------------------------------------------------
last_user_idx = None
for i in range(n - 1, -1, -1):
if messages[i].get("role") == "user":
last_user_idx = i
break
if last_user_idx is not None and cut_idx > last_user_idx:
cut_idx = last_user_idx

Copilot AI Apr 13, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fixes address two production regressions (tail cut after last user; head fossilization when no system prompt). There are existing compressor tests, but none that assert (a) last user remains in the TAIL when the transcript ends with many small tool results, or (b) compress_start becomes 0 when messages[0].role != "system" (and that the first user message doesn’t persist across successive compressions). Adding targeted regression tests for both scenarios would help prevent reintroducing these bugs.

Copilot uses AI. Check for mistakes.
Three issues raised in review:

1. _min_for_compress guard used self.protect_first_n unconditionally, which
   was too conservative for cold-start sessions (no system message) where
   the effective head size is 0. Fixed to use _effective_head.

2. HEAD boundary logic was inlined in compress() and duplicated in
   gateway/run.py preview path. Extracted into _compute_compress_start()
   so both paths share identical logic and cannot drift.

3. No regression tests existed for the two production bugs. Added
   TestHotfixRegressions with four targeted cases:
   - last user message lands in TAIL when transcript ends with many
     small tool results (Hotfix 1 / ghost response)
   - compress_start == 0 for cold-start sessions (Hotfix 2)
   - compress_start == protect_first_n when system message present
   - first user message absent from compressed output after cold-start

Also updated two existing role-collision tests that assumed no-system-message
HEAD behaviour. Both now include a system message so protect_first_n applies
as originally intended, keeping their role-alternation assertions valid.
@davidvv

davidvv commented Apr 14, 2026

Copy link
Copy Markdown
Author

Compression failed: 'ContextCompressor' object has no attribute '_compute_compress_start'

@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 27, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to #11996 and #12092 (head message fossilization) — this PR addresses the same protect_first_n fossilization bug plus an additional ghost response bug from tail-cut boundary miscalculation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Context compression causes incoherent responses on small-context models

3 participants