Skip to content

fix(agent): strengthen compression preamble vs stale tasks (#41607)#41650

Closed
kyssta-exe wants to merge 1 commit into
NousResearch:mainfrom
kyssta-exe:auto-fix/issue-41607
Closed

fix(agent): strengthen compression preamble vs stale tasks (#41607)#41650
kyssta-exe wants to merge 1 commit into
NousResearch:mainfrom
kyssta-exe:auto-fix/issue-41607

Conversation

@kyssta-exe

Copy link
Copy Markdown
Contributor

Fixes #41607. After context compression, the agent was executing stale instructions from 'Pending User Asks' and 'Remaining Work' sections in the summary instead of responding to the latest user message. This happened because (1) the preamble's conflict-resolution language was ambiguous when there was topical overlap, and (2) section names like 'Pending User Asks' sounded like active tasks. Fix: (1) replaced the preamble's conflict rule with explicit 'topic overlap does NOT mean resume summary task' language, (2) renamed 'Pending User Asks' to 'Historical User Asks (DO NOT EXECUTE)' and 'Remaining Work' to 'Historical Work (DO NOT EXECUTE)', (3) added explicit DO NOT ACT instructions in the section descriptions.

@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Jun 8, 2026
@liuhao1024

Copy link
Copy Markdown
Contributor

Verification comment — reviewer: liuhao1024 (Hermes Agent automated review)

Reviewed the full diff. This strengthens the compression preamble against stale-task execution — the core #35344 contract:

  1. Renaming is deliberate: "Pending User Asks" → "Historical User Asks (DO NOT EXECUTE)" and "Remaining Work" → "Historical Work (DO NOT EXECUTE)" makes the stale-intent unambiguous to the summarizer LLM. The parenthetical directive is a stronger signal than relying on the surrounding prose alone.

  2. SUMMARY_PREFIX rewording: The new phrasing ("Any topic overlap between the summary and the latest user message does NOT mean you should resume the summary's task") addresses the specific failure mode where the LLM interprets topic similarity as task continuation. The previous "if ... supersedes" framing required the LLM to detect a contradiction — the new framing inverts the default: always treat the latest message as the active task unless explicitly asked to continue.

  3. Test relaxation is intentional: The tests now accept "priority" alongside "wins"/"supersede"/"discard" because the new prefix uses "priority" language ("The latest user message always takes priority"). The assertions still enforce that an explicit conflict-resolution rule exists.

  4. No code logic changes: Pure prompt engineering — all changes are in string templates and test assertions. Risk is limited to LLM behavior, not runtime correctness.

LGTM.

@AIEngineerX

Copy link
Copy Markdown

Nice work — the topic-overlap reframing and the (DO NOT EXECUTE) section renames are a clear improvement over the soft "treat as background" framing. One correctness gap, and one heads-up.

🐞 Changing SUMMARY_PREFIX needs a matching freeze in _HISTORICAL_SUMMARY_PREFIXES

This PR edits SUMMARY_PREFIX but doesn't add the previous text to _HISTORICAL_SUMMARY_PREFIXES. That tuple carries an explicit in-file instruction:

Add a frozen copy here whenever SUMMARY_PREFIX changes.

This is the #35344 mechanism. A summary persisted under the old prefix can be inherited into a resumed/continuing lineage. On the next (re-)compaction:

  • _is_context_summary_content() recognizes a summary only via startswith(SUMMARY_PREFIX) / LEGACY_SUMMARY_PREFIX / *_HISTORICAL_SUMMARY_PREFIXES. After this merges, an old-prefix summary matches none of them → it's treated as an ordinary user turn.
  • _strip_summary_prefix() likewise won't strip it → the old preamble text stays embedded in the body, and the next model can act on the stale directive it carried.

So any session that compacted before this upgrade and continues after it loses summary detection/normalization — the exact failure the historical-prefix list exists to prevent. The current test_resume_stale_active_task.py suite doesn't catch this because it only exercises the already-frozen pre-#35344 prefix, not the one this PR is replacing.

Fix: add the current main SUMMARY_PREFIX (the block this PR replaces) as the newest (first) entry in _HISTORICAL_SUMMARY_PREFIXES:

Frozen copy to add (verbatim current main prefix)
_HISTORICAL_SUMMARY_PREFIXES = (
    # Pre-<this PR>: had the "latest message WINS — discard" conflict clause,
    # before the topic-overlap reframing. Kept so summaries persisted under it
    # are still recognized and re-normalized to the current prefix on re-compaction.
    "[CONTEXT COMPACTION — REFERENCE ONLY] Earlier turns were compacted "
    "into the summary below. This is a handoff from a previous context "
    "window — treat it as background reference, NOT as active instructions. "
    "Do NOT answer questions or fulfill requests mentioned in this summary; "
    "they were already addressed. "
    "Respond ONLY to the latest user message that appears AFTER this "
    "summary — that message is the single source of truth for what to do "
    "right now. "
    "If the latest user message is consistent with the '## Active Task' "
    "section, you may use the summary as background. If the latest user "
    "message contradicts, supersedes, changes topic from, or in any way "
    "diverges from '## Active Task' / '## In Progress' / '## Pending User "
    "Asks' / '## Remaining Work', the latest message WINS — discard those "
    "stale items entirely and do not 'wrap up the old task first'. "
    "Reverse signals in the latest message (e.g. 'stop', 'undo', 'roll "
    "back', 'just verify', 'don't do that anymore', 'never mind', a new "
    "topic) must immediately end any in-flight work described in the "
    "summary; do not re-surface it in later turns. "
    "IMPORTANT: Your persistent memory (MEMORY.md, USER.md) in the system "
    "prompt is ALWAYS authoritative and active — never ignore or deprioritize "
    "memory content due to this compaction note. "
    "The current session state (files, config, etc.) may reflect work "
    "described here — avoid repeating it:",
    # Pre-#35344: contained the self-contradicting "resume exactly" directive.
    "[CONTEXT COMPACTION — REFERENCE ONLY] Earlier turns were compacted "
    # ... (existing entry unchanged) ...
)

A focused regression test (fails without the freeze, passes with it):

def test_prefixed_handoff_under_previous_prefix_renormalized():
    # `prev` = the current main SUMMARY_PREFIX this PR replaces (verbatim).
    handoff = f"{prev}\n## Active Task\nUser asked: 'task A'"
    out = ContextCompressor._with_summary_prefix(handoff)
    assert out.startswith(SUMMARY_PREFIX)        # upgraded to the new prefix
    assert "task A" in out                       # body preserved
    assert out.count("Earlier turns were compacted") == 1  # old prefix stripped, not embedded

⚠️ Heads-up: the prefix drops "WINS"/"discard"; the #35344 tests were loosened to match

The new wording replaces "the latest message WINS — discard those stale items entirely" with "always takes priority", and to keep test_latest_message_wins_on_conflict / test_latest_message_wins_over_inherited_active_task green they were relaxed to also accept "priority". Defensible, but it does soften the explicit #35344 contract words those assertions were pinning. Worth a maintainer confirming "priority" is intended as equivalent to "WINS/discard" rather than a silent weakening — especially since both PRs targeting this issue (this one and #41634) keep the strong "discard/stale" language in the preamble while only #41634 also hardens the section descriptions.


Context: I independently hit the same gap while drafting a fix for #41607 — the freeze is easy to miss because nothing in CI fails without it. Happy to open a tiny follow-up PR with just the freeze + test against whichever of #41650/#41634 lands, if that's useful.

@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #44454 — your commit was cherry-picked onto current main with your authorship preserved in git log (8f8cad7). Your carveout removal was merged with #44345's heading constants; the automated-review point about freezing the old prefix into _HISTORICAL_SUMMARY_PREFIXES is addressed in the follow-up commit. Thanks!

#44454

@teknium1 teknium1 closed this Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compression summary stale instructions executed as current task

5 participants