fix(compressor): shrink protect_first_n on recompaction (#17344)#17349
Open
Sanjays2402 wants to merge 1 commit into
Open
fix(compressor): shrink protect_first_n on recompaction (#17344)#17349Sanjays2402 wants to merge 1 commit into
Sanjays2402 wants to merge 1 commit into
Conversation
…#17344) Bug --- Reporter traced a 6-session compression chain in which every child session carried the *identical* original first user request \u2014 as if no progress had been made across the entire chain. After resume (and on new sessions opened post-compression), the model would re-execute the original first task instead of continuing from the handoff summary's '## Active Task' section. Root cause ---------- ContextCompressor protects 'protect_first_n=3' messages at the head (default: [system, user1, assistant1]). On every compaction cycle that head is preserved verbatim. So after compaction N the message list looks like: [system + compaction note, user1 (ORIGINAL), assistant1, summary, ...tail..., latest_user] The summary tells the model 'resume from ## Active Task', but 'user1 (ORIGINAL)' is sitting right next to it as a still-prominent user-role message. Most models latch onto the first plausible unanswered request and re-execute it \u2014 the structured summary guidance loses against direct attention on a user message. After 6 compactions, that same user1 has been re-anchored 6 times. The chain never makes progress because every cycle's first task is the original one. Fix --- On the *second and subsequent* compactions \u2014 detected by checking whether messages[0] (the system prompt) already carries the compaction note we appended last time \u2014 shrink protect_first_n to 1 for that call. The original [user1, assistant1] then flow into the summariser pool alongside everything else, and the structured ## Active Task section becomes the sole steering signal as designed. The shrink is per-call; self.protect_first_n is left untouched so fresh sessions continue to use the configured default. Detection signal ---------------- Reuses the existing compaction note that compress() already writes to the system prompt on first compaction. A new _COMPRESSION_NOTE_SENTINEL constant captures a stable substring of that note so that PR NousResearch#17301 (which is also editing the note text) will not break detection \u2014 the sentinel matches both the current note and NousResearch#17301's expanded version. The new helper ContextCompressor._is_recompaction(messages) does the lookup with no I/O, returns False on malformed input, and handles multimodal system content via the existing _content_text_for_contains() helper. Why not just strengthen SUMMARY_PREFIX -------------------------------------- SUMMARY_PREFIX already says 'Respond ONLY to the latest user message that appears AFTER this summary.' Stronger prose helps marginally but cannot compete with structural attention on a head-preserved user message. The reporter explicitly noted: 'the model responds as if the session had just started.' That's an architectural problem, not a wording problem; this PR fixes the architecture. Why not also fix the 'parent_session_id NULL' observation --------------------------------------------------------- The reporter raised that as a related concern. It's covered by the existing NousResearch#15000 / resolve_resume_session_id() chain-walk and the end_session()/create_session() hand-off in run_agent.py:8888 which explicitly passes parent_session_id=old_session_id. If the reporter sees NULL it's likely a separate DB-write failure path \u2014 worth its own bug. This PR stays focused on the message-level fix that actually unbreaks the user-visible 'restarts first task' behaviour. Coordination with PR NousResearch#17301 / issue NousResearch#17251 ------------------------------------------ PR NousResearch#17301 (open, by HiddenPuppy) addresses a sibling problem: SUMMARY_PREFIX over-applies 'background reference' framing to the agent's persistent memory and skills. Both are real, both stem from the compaction handoff being too easily misinterpreted by the model, but they're orthogonal fixes \u2014 NousResearch#17301 carves out exceptions inside SUMMARY_PREFIX text, this PR shrinks protect_first_n on recompaction. They compose cleanly; merge order doesn't matter. Tests ----- - TestIsRecompaction (6 cases): sentinel detection \u2014 fresh prompt, prompt-with-note, multimodal content, empty messages, non-system first message, garbage content. Helper never raises. - TestRecompactionShrinksProtectFirstN (5 cases): first compaction preserves first exchange in head; recompaction demotes it to the summariser pool; latest user message survives in tail; configured protect_first_n attribute is not mutated; protect_first_n=1 configurations are no-ops on recompaction. - TestRecompactionMinForCompressGate (1 case): _min_for_compress early-return uses the effective (post-shrink) head count, so small recompacted sessions can still compress. 99 passed across the four compression test files (87 pre-existing + 12 new) on Python 3.12, no regressions.
Contributor
Author
|
CI status note for maintainers — the failing Verified by diffing the failing-test sets:
The clusters on main:
Happy to open targeted fix PRs for any of these clusters if it helps unblock the queue. Otherwise this PR is ready whenever main is green. |
Cyrene963
pushed a commit
to Cyrene963/hermes-agent
that referenced
this pull request
May 3, 2026
Merged changes from upstream PRs NousResearch#17380 and NousResearch#17349: - SUMMARY_PREFIX: memory is ALWAYS authoritative, never background reference - memory_manager: 'informational background data' -> 'authoritative reference data' - Recompaction detection: shrink protect_first_n to avoid stale first exchange - Compression note: memory remains fully authoritative after compaction - Backward-compatible regex for new/old memory labels - Regression tests for recompaction behavior
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #17344.
Bug
Reporter traced a 6-session compression chain in which every child session carried the identical original first user request — as if no progress had been made. After resume (or on new sessions opened post-compression), the model re-executes the original first task instead of continuing from the handoff summary's
## Active Task.Root cause
ContextCompressorprotectsprotect_first_n=3messages at the head —[system, user1, assistant1]. On every cycle that head is preserved verbatim:SUMMARY_PREFIXsays "resume from## Active Task," butuser1 (ORIGINAL)is sitting right next to it as a still-prominent user-role message. The model latches onto the first plausible unanswered request and re-executes it — structured summary prose loses against direct attention on a user message. After 6 cycles that sameuser1has been re-anchored 6 times.Fix
On the second and subsequent compactions — detected by checking whether
messages[0](the system prompt) already carries the compaction note we appended last time — shrinkprotect_first_nto 1 for that call. The original[user1, assistant1]then flow into the summariser pool, and the structured## Active Tasksection becomes the sole steering signal as designed.The shrink is per-call;
self.protect_first_nis left untouched so fresh sessions continue to use the configured default.Detection signal
Reuses the existing compaction note already written to the system prompt on first compaction. A new
_COMPRESSION_NOTE_SENTINELconstant captures a stable substring ("earlier conversation turns have been compacted into a handoff summary") so PR #17301 — which expands the note text — will not break detection. New helperContextCompressor._is_recompaction(messages)does the lookup with no I/O, returnsFalseon malformed input, and handles multimodal system content via the existing_content_text_for_contains()helper.Why not just strengthen
SUMMARY_PREFIX?The prefix already says "Respond ONLY to the latest user message that appears AFTER this summary." Stronger prose helps marginally but cannot compete with structural attention on a head-preserved user message. The reporter explicitly noted: "the model responds as if the session had just started." That's an architectural problem, not a wording problem.
Coordination with PR #17301 / #17251
PR #17301 (open, by @HiddenPuppy) addresses a sibling problem:
SUMMARY_PREFIXover-applies "background reference" framing to memory and skills. Both fixes stem from the same root concern (compaction handoff misinterpreted by the model) but are orthogonal — #17301 carves out exceptions insideSUMMARY_PREFIXtext; this PR shrinksprotect_first_non recompaction. They compose cleanly; merge order doesn't matter.Out of scope
The reporter also flagged
parent_session_id = NULLobservations on chained sessions. That's a separate DB-write concern —run_agent.py:8891explicitly passesparent_session_id=old_session_idandresolve_resume_session_id(#15000) handles chain-walking. If NULL is observed it's likely a different write-path failure and deserves its own bug. This PR stays focused on the message-level fix that unbreaks the user-visible "restarts first task" behaviour.Tests
TestIsRecompaction(6 cases) — sentinel detection edge cases:test_fresh_system_prompt_is_not_recompactiontest_system_prompt_with_compaction_note_is_recompactiontest_empty_messages_safetest_non_system_first_message_is_not_recompactiontest_multimodal_system_content_is_inspectedtest_garbage_content_does_not_raiseTestRecompactionShrinksProtectFirstN(5 cases) — behavioural:test_first_compaction_preserves_first_exchange_in_head(control)test_recompaction_demotes_first_exchange_to_summary(the bug)test_recompaction_preserves_latest_user_message_in_tailtest_recompaction_keeps_protect_first_n_attribute_unchangedtest_protect_first_n_one_no_op_for_recompactionTestRecompactionMinForCompressGate(1 case) —_min_for_compressearly-return uses the effective (post-shrink) head count.87 pre-existing + 12 new, zero regressions.