fix(compressor): shrink protect_first_n on recompaction (#17344) by Sanjays2402 · Pull Request #17349 · NousResearch/hermes-agent

Sanjays2402 · 2026-04-29T08:30:10Z

Bug

Reporter traced a 6-session compression chain in which every child session carried the identical original first user request — as if no progress had been made. After resume (or on new sessions opened post-compression), the model re-executes the original first task instead of continuing from the handoff summary's ## Active Task.

Root cause

ContextCompressor protects protect_first_n=3 messages at the head — [system, user1, assistant1]. On every cycle that head is preserved verbatim:

[system + compaction note, user1 (ORIGINAL), assistant1, summary, …tail…, latest_user]

SUMMARY_PREFIX says "resume from ## Active Task," but user1 (ORIGINAL) is sitting right next to it as a still-prominent user-role message. The model latches onto the first plausible unanswered request and re-executes it — structured summary prose loses against direct attention on a user message. After 6 cycles that same user1 has been re-anchored 6 times.

Fix

On the second and subsequent compactions — detected by checking whether messages[0] (the system prompt) already carries the compaction note we appended last time — shrink protect_first_n to 1 for that call. The original [user1, assistant1] then flow into the summariser pool, and the structured ## Active Task section becomes the sole steering signal as designed.

The shrink is per-call; self.protect_first_n is left untouched so fresh sessions continue to use the configured default.

effective_protect_first_n = self.protect_first_n
if self._is_recompaction(messages) and self.protect_first_n > 1:
    effective_protect_first_n = 1

Detection signal

Reuses the existing compaction note already written to the system prompt on first compaction. A new _COMPRESSION_NOTE_SENTINEL constant captures a stable substring ("earlier conversation turns have been compacted into a handoff summary") so PR #17301 — which expands the note text — will not break detection. New helper ContextCompressor._is_recompaction(messages) does the lookup with no I/O, returns False on malformed input, and handles multimodal system content via the existing _content_text_for_contains() helper.

Why not just strengthen `SUMMARY_PREFIX`?

The prefix already says "Respond ONLY to the latest user message that appears AFTER this summary." Stronger prose helps marginally but cannot compete with structural attention on a head-preserved user message. The reporter explicitly noted: "the model responds as if the session had just started." That's an architectural problem, not a wording problem.

Coordination with PR #17301 / #17251

PR #17301 (open, by @HiddenPuppy) addresses a sibling problem: SUMMARY_PREFIX over-applies "background reference" framing to memory and skills. Both fixes stem from the same root concern (compaction handoff misinterpreted by the model) but are orthogonal — #17301 carves out exceptions inside SUMMARY_PREFIX text; this PR shrinks protect_first_n on recompaction. They compose cleanly; merge order doesn't matter.

Out of scope

The reporter also flagged parent_session_id = NULL observations on chained sessions. That's a separate DB-write concern — run_agent.py:8891 explicitly passes parent_session_id=old_session_id and resolve_resume_session_id (#15000) handles chain-walking. If NULL is observed it's likely a different write-path failure and deserves its own bug. This PR stays focused on the message-level fix that unbreaks the user-visible "restarts first task" behaviour.

Tests

TestIsRecompaction (6 cases) — sentinel detection edge cases:

test_fresh_system_prompt_is_not_recompaction
test_system_prompt_with_compaction_note_is_recompaction
test_empty_messages_safe
test_non_system_first_message_is_not_recompaction
test_multimodal_system_content_is_inspected
test_garbage_content_does_not_raise

TestRecompactionShrinksProtectFirstN (5 cases) — behavioural:

test_first_compaction_preserves_first_exchange_in_head (control)
test_recompaction_demotes_first_exchange_to_summary (the bug)
test_recompaction_preserves_latest_user_message_in_tail
test_recompaction_keeps_protect_first_n_attribute_unchanged
test_protect_first_n_one_no_op_for_recompaction

TestRecompactionMinForCompressGate (1 case) — _min_for_compress early-return uses the effective (post-shrink) head count.

$ python -m pytest tests/agent/test_context_compressor.py \
                   tests/agent/test_context_compressor_recompaction.py \
                   tests/run_agent/test_compression_boundary_hook.py \
                   tests/run_agent/test_compression_persistence.py \
                   tests/run_agent/test_413_compression.py -q
99 passed in 8.26s

87 pre-existing + 12 new, zero regressions.

…#17344) Bug --- Reporter traced a 6-session compression chain in which every child session carried the *identical* original first user request \u2014 as if no progress had been made across the entire chain. After resume (and on new sessions opened post-compression), the model would re-execute the original first task instead of continuing from the handoff summary's '## Active Task' section. Root cause ---------- ContextCompressor protects 'protect_first_n=3' messages at the head (default: [system, user1, assistant1]). On every compaction cycle that head is preserved verbatim. So after compaction N the message list looks like: [system + compaction note, user1 (ORIGINAL), assistant1, summary, ...tail..., latest_user] The summary tells the model 'resume from ## Active Task', but 'user1 (ORIGINAL)' is sitting right next to it as a still-prominent user-role message. Most models latch onto the first plausible unanswered request and re-execute it \u2014 the structured summary guidance loses against direct attention on a user message. After 6 compactions, that same user1 has been re-anchored 6 times. The chain never makes progress because every cycle's first task is the original one. Fix --- On the *second and subsequent* compactions \u2014 detected by checking whether messages[0] (the system prompt) already carries the compaction note we appended last time \u2014 shrink protect_first_n to 1 for that call. The original [user1, assistant1] then flow into the summariser pool alongside everything else, and the structured ## Active Task section becomes the sole steering signal as designed. The shrink is per-call; self.protect_first_n is left untouched so fresh sessions continue to use the configured default. Detection signal ---------------- Reuses the existing compaction note that compress() already writes to the system prompt on first compaction. A new _COMPRESSION_NOTE_SENTINEL constant captures a stable substring of that note so that PR NousResearch#17301 (which is also editing the note text) will not break detection \u2014 the sentinel matches both the current note and NousResearch#17301's expanded version. The new helper ContextCompressor._is_recompaction(messages) does the lookup with no I/O, returns False on malformed input, and handles multimodal system content via the existing _content_text_for_contains() helper. Why not just strengthen SUMMARY_PREFIX -------------------------------------- SUMMARY_PREFIX already says 'Respond ONLY to the latest user message that appears AFTER this summary.' Stronger prose helps marginally but cannot compete with structural attention on a head-preserved user message. The reporter explicitly noted: 'the model responds as if the session had just started.' That's an architectural problem, not a wording problem; this PR fixes the architecture. Why not also fix the 'parent_session_id NULL' observation --------------------------------------------------------- The reporter raised that as a related concern. It's covered by the existing NousResearch#15000 / resolve_resume_session_id() chain-walk and the end_session()/create_session() hand-off in run_agent.py:8888 which explicitly passes parent_session_id=old_session_id. If the reporter sees NULL it's likely a separate DB-write failure path \u2014 worth its own bug. This PR stays focused on the message-level fix that actually unbreaks the user-visible 'restarts first task' behaviour. Coordination with PR NousResearch#17301 / issue NousResearch#17251 ------------------------------------------ PR NousResearch#17301 (open, by HiddenPuppy) addresses a sibling problem: SUMMARY_PREFIX over-applies 'background reference' framing to the agent's persistent memory and skills. Both are real, both stem from the compaction handoff being too easily misinterpreted by the model, but they're orthogonal fixes \u2014 NousResearch#17301 carves out exceptions inside SUMMARY_PREFIX text, this PR shrinks protect_first_n on recompaction. They compose cleanly; merge order doesn't matter. Tests ----- - TestIsRecompaction (6 cases): sentinel detection \u2014 fresh prompt, prompt-with-note, multimodal content, empty messages, non-system first message, garbage content. Helper never raises. - TestRecompactionShrinksProtectFirstN (5 cases): first compaction preserves first exchange in head; recompaction demotes it to the summariser pool; latest user message survives in tail; configured protect_first_n attribute is not mutated; protect_first_n=1 configurations are no-ops on recompaction. - TestRecompactionMinForCompressGate (1 case): _min_for_compress early-return uses the effective (post-shrink) head count, so small recompacted sessions can still compress. 99 passed across the four compression test files (87 pre-existing + 12 new) on Python 3.12, no regressions.

Sanjays2402 · 2026-05-02T20:40:49Z

CI status note for maintainers — the failing test check on this PR is from a set of 15 pre-existing test failures on main, not regressions introduced here.

Verified by diffing the failing-test sets:

Latest main push run (25250051126): 16 failed
This PR's run: same 15 (subset)
Net new failures introduced by this PR: 0

The clusters on main:

Cluster	Tests	Likely cause
Systemd `TimeoutStopSec`	2× `test_*_unit_avoids_recursive_execstop_and_uses_extended_stop_timeout`	Code emits `210`, test asserts `90` — drift
Gateway restart `kill` semantics	3× `test_update_*`	Recent change in expected `.kill()` call counts
`update --yes` flag	2× `test_update_yes_flag`	TTY prompt / stash restore behavior changed
Dockerfile pid1	2× `test_dockerfile_*`	Dockerfile regenerated, dropped TUI ink references
Concurrent interrupt	2×	`_Stub` test fixture missing `_tool_guardrails` attr
dotenv vs `os.environ`	`test_os_environ_still_wins_over_dotenv`	Same class as #18757; happy to add to my fix PR if useful
ACP commands	`test_send_available_commands_update`	Command list ordering
Teams typing	`test_send_typing`	Mock not awaited
TUI `pending_title`	`test_session_create_drops_pending_title_on_valueerror`	ValueError no longer drops title

Happy to open targeted fix PRs for any of these clusters if it helps unblock the queue. Otherwise this PR is ready whenever main is green.

Merged changes from upstream PRs NousResearch#17380 and NousResearch#17349: - SUMMARY_PREFIX: memory is ALWAYS authoritative, never background reference - memory_manager: 'informational background data' -> 'authoritative reference data' - Recompaction detection: shrink protect_first_n to avoid stale first exchange - Compression note: memory remains fully authoritative after compaction - Backward-compatible regex for new/old memory labels - Regression tests for recompaction behavior

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(compressor): shrink protect_first_n on recompaction (#17344)#17349

fix(compressor): shrink protect_first_n on recompaction (#17344)#17349
Sanjays2402 wants to merge 1 commit into
NousResearch:mainfrom
Sanjays2402:fix/17344-recompaction-head-shrink

Sanjays2402 commented Apr 29, 2026

Uh oh!

Sanjays2402 commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Sanjays2402 commented Apr 29, 2026

Bug

Root cause

Fix

Detection signal

Why not just strengthen SUMMARY_PREFIX?

Coordination with PR #17301 / #17251

Out of scope

Tests

Uh oh!

Sanjays2402 commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why not just strengthen `SUMMARY_PREFIX`?