Skip to content

fix(compressor): shrink protect_first_n on recompaction (#17344)#17349

Open
Sanjays2402 wants to merge 1 commit into
NousResearch:mainfrom
Sanjays2402:fix/17344-recompaction-head-shrink
Open

fix(compressor): shrink protect_first_n on recompaction (#17344)#17349
Sanjays2402 wants to merge 1 commit into
NousResearch:mainfrom
Sanjays2402:fix/17344-recompaction-head-shrink

Conversation

@Sanjays2402

Copy link
Copy Markdown
Contributor

Closes #17344.

Bug

Reporter traced a 6-session compression chain in which every child session carried the identical original first user request — as if no progress had been made. After resume (or on new sessions opened post-compression), the model re-executes the original first task instead of continuing from the handoff summary's ## Active Task.

Root cause

ContextCompressor protects protect_first_n=3 messages at the head — [system, user1, assistant1]. On every cycle that head is preserved verbatim:

[system + compaction note, user1 (ORIGINAL), assistant1, summary, …tail…, latest_user]

SUMMARY_PREFIX says "resume from ## Active Task," but user1 (ORIGINAL) is sitting right next to it as a still-prominent user-role message. The model latches onto the first plausible unanswered request and re-executes it — structured summary prose loses against direct attention on a user message. After 6 cycles that same user1 has been re-anchored 6 times.

Fix

On the second and subsequent compactions — detected by checking whether messages[0] (the system prompt) already carries the compaction note we appended last time — shrink protect_first_n to 1 for that call. The original [user1, assistant1] then flow into the summariser pool, and the structured ## Active Task section becomes the sole steering signal as designed.

The shrink is per-call; self.protect_first_n is left untouched so fresh sessions continue to use the configured default.

effective_protect_first_n = self.protect_first_n
if self._is_recompaction(messages) and self.protect_first_n > 1:
    effective_protect_first_n = 1

Detection signal

Reuses the existing compaction note already written to the system prompt on first compaction. A new _COMPRESSION_NOTE_SENTINEL constant captures a stable substring ("earlier conversation turns have been compacted into a handoff summary") so PR #17301 — which expands the note text — will not break detection. New helper ContextCompressor._is_recompaction(messages) does the lookup with no I/O, returns False on malformed input, and handles multimodal system content via the existing _content_text_for_contains() helper.

Why not just strengthen SUMMARY_PREFIX?

The prefix already says "Respond ONLY to the latest user message that appears AFTER this summary." Stronger prose helps marginally but cannot compete with structural attention on a head-preserved user message. The reporter explicitly noted: "the model responds as if the session had just started." That's an architectural problem, not a wording problem.

Coordination with PR #17301 / #17251

PR #17301 (open, by @HiddenPuppy) addresses a sibling problem: SUMMARY_PREFIX over-applies "background reference" framing to memory and skills. Both fixes stem from the same root concern (compaction handoff misinterpreted by the model) but are orthogonal — #17301 carves out exceptions inside SUMMARY_PREFIX text; this PR shrinks protect_first_n on recompaction. They compose cleanly; merge order doesn't matter.

Out of scope

The reporter also flagged parent_session_id = NULL observations on chained sessions. That's a separate DB-write concern — run_agent.py:8891 explicitly passes parent_session_id=old_session_id and resolve_resume_session_id (#15000) handles chain-walking. If NULL is observed it's likely a different write-path failure and deserves its own bug. This PR stays focused on the message-level fix that unbreaks the user-visible "restarts first task" behaviour.

Tests

TestIsRecompaction (6 cases) — sentinel detection edge cases:

  • test_fresh_system_prompt_is_not_recompaction
  • test_system_prompt_with_compaction_note_is_recompaction
  • test_empty_messages_safe
  • test_non_system_first_message_is_not_recompaction
  • test_multimodal_system_content_is_inspected
  • test_garbage_content_does_not_raise

TestRecompactionShrinksProtectFirstN (5 cases) — behavioural:

  • test_first_compaction_preserves_first_exchange_in_head (control)
  • test_recompaction_demotes_first_exchange_to_summary (the bug)
  • test_recompaction_preserves_latest_user_message_in_tail
  • test_recompaction_keeps_protect_first_n_attribute_unchanged
  • test_protect_first_n_one_no_op_for_recompaction

TestRecompactionMinForCompressGate (1 case)_min_for_compress early-return uses the effective (post-shrink) head count.

$ python -m pytest tests/agent/test_context_compressor.py \
                   tests/agent/test_context_compressor_recompaction.py \
                   tests/run_agent/test_compression_boundary_hook.py \
                   tests/run_agent/test_compression_persistence.py \
                   tests/run_agent/test_413_compression.py -q
99 passed in 8.26s

87 pre-existing + 12 new, zero regressions.

…#17344)

Bug
---
Reporter traced a 6-session compression chain in which every child
session carried the *identical* original first user request \u2014 as if
no progress had been made across the entire chain. After resume (and
on new sessions opened post-compression), the model would re-execute
the original first task instead of continuing from the handoff
summary's '## Active Task' section.

Root cause
----------
ContextCompressor protects 'protect_first_n=3' messages at the head
(default: [system, user1, assistant1]). On every compaction cycle
that head is preserved verbatim. So after compaction N the message
list looks like:

  [system + compaction note, user1 (ORIGINAL), assistant1, summary,
   ...tail..., latest_user]

The summary tells the model 'resume from ## Active Task', but
'user1 (ORIGINAL)' is sitting right next to it as a still-prominent
user-role message. Most models latch onto the first plausible
unanswered request and re-execute it \u2014 the structured summary
guidance loses against direct attention on a user message.

After 6 compactions, that same user1 has been re-anchored 6 times.
The chain never makes progress because every cycle's first task is
the original one.

Fix
---
On the *second and subsequent* compactions \u2014 detected by checking
whether messages[0] (the system prompt) already carries the compaction
note we appended last time \u2014 shrink protect_first_n to 1 for that
call. The original [user1, assistant1] then flow into the summariser
pool alongside everything else, and the structured ## Active Task
section becomes the sole steering signal as designed.

The shrink is per-call; self.protect_first_n is left untouched so
fresh sessions continue to use the configured default.

Detection signal
----------------
Reuses the existing compaction note that compress() already writes
to the system prompt on first compaction. A new
_COMPRESSION_NOTE_SENTINEL constant captures a stable substring of
that note so that PR NousResearch#17301 (which is also editing the note text)
will not break detection \u2014 the sentinel matches both the current
note and NousResearch#17301's expanded version. The new helper
ContextCompressor._is_recompaction(messages) does the lookup with
no I/O, returns False on malformed input, and handles multimodal
system content via the existing _content_text_for_contains() helper.

Why not just strengthen SUMMARY_PREFIX
--------------------------------------
SUMMARY_PREFIX already says 'Respond ONLY to the latest user message
that appears AFTER this summary.' Stronger prose helps marginally
but cannot compete with structural attention on a head-preserved
user message. The reporter explicitly noted: 'the model responds as
if the session had just started.' That's an architectural problem,
not a wording problem; this PR fixes the architecture.

Why not also fix the 'parent_session_id NULL' observation
---------------------------------------------------------
The reporter raised that as a related concern. It's covered by the
existing NousResearch#15000 / resolve_resume_session_id() chain-walk and the
end_session()/create_session() hand-off in run_agent.py:8888 which
explicitly passes parent_session_id=old_session_id. If the reporter
sees NULL it's likely a separate DB-write failure path \u2014 worth its
own bug. This PR stays focused on the message-level fix that
actually unbreaks the user-visible 'restarts first task' behaviour.

Coordination with PR NousResearch#17301 / issue NousResearch#17251
------------------------------------------
PR NousResearch#17301 (open, by HiddenPuppy) addresses a sibling problem:
SUMMARY_PREFIX over-applies 'background reference' framing to the
agent's persistent memory and skills. Both are real, both stem from
the compaction handoff being too easily misinterpreted by the
model, but they're orthogonal fixes \u2014 NousResearch#17301 carves out exceptions
inside SUMMARY_PREFIX text, this PR shrinks protect_first_n on
recompaction. They compose cleanly; merge order doesn't matter.

Tests
-----
- TestIsRecompaction (6 cases): sentinel detection \u2014 fresh prompt,
  prompt-with-note, multimodal content, empty messages, non-system
  first message, garbage content. Helper never raises.
- TestRecompactionShrinksProtectFirstN (5 cases): first compaction
  preserves first exchange in head; recompaction demotes it to the
  summariser pool; latest user message survives in tail; configured
  protect_first_n attribute is not mutated; protect_first_n=1
  configurations are no-ops on recompaction.
- TestRecompactionMinForCompressGate (1 case): _min_for_compress
  early-return uses the effective (post-shrink) head count, so
  small recompacted sessions can still compress.

99 passed across the four compression test files (87 pre-existing +
12 new) on Python 3.12, no regressions.
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 29, 2026
@Sanjays2402

Copy link
Copy Markdown
Contributor Author

CI status note for maintainers — the failing test check on this PR is from a set of 15 pre-existing test failures on main, not regressions introduced here.

Verified by diffing the failing-test sets:

  • Latest main push run (25250051126): 16 failed
  • This PR's run: same 15 (subset)
  • Net new failures introduced by this PR: 0

The clusters on main:

Cluster Tests Likely cause
Systemd TimeoutStopSec test_*_unit_avoids_recursive_execstop_and_uses_extended_stop_timeout Code emits 210, test asserts 90 — drift
Gateway restart kill semantics test_update_* Recent change in expected .kill() call counts
update --yes flag test_update_yes_flag TTY prompt / stash restore behavior changed
Dockerfile pid1 test_dockerfile_* Dockerfile regenerated, dropped TUI ink references
Concurrent interrupt _Stub test fixture missing _tool_guardrails attr
dotenv vs os.environ test_os_environ_still_wins_over_dotenv Same class as #18757; happy to add to my fix PR if useful
ACP commands test_send_available_commands_update Command list ordering
Teams typing test_send_typing Mock not awaited
TUI pending_title test_session_create_drops_pending_title_on_valueerror ValueError no longer drops title

Happy to open targeted fix PRs for any of these clusters if it helps unblock the queue. Otherwise this PR is ready whenever main is green.

Cyrene963 pushed a commit to Cyrene963/hermes-agent that referenced this pull request May 3, 2026
Merged changes from upstream PRs NousResearch#17380 and NousResearch#17349:
- SUMMARY_PREFIX: memory is ALWAYS authoritative, never background reference
- memory_manager: 'informational background data' -> 'authoritative reference data'
- Recompaction detection: shrink protect_first_n to avoid stale first exchange
- Compression note: memory remains fully authoritative after compaction
- Backward-compatible regex for new/old memory labels
- Regression tests for recompaction behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Context compression + session resume causes model to re-execute the original first task instead of continuing from compressed state

2 participants