Skip to content

fix(#41607): Strengthen compression summary to prevent stale instructions execution#41634

Closed
iamlukethedev wants to merge 1 commit into
NousResearch:mainfrom
iamlukethedev:fix/41607-compression-stale-instructions
Closed

fix(#41607): Strengthen compression summary to prevent stale instructions execution#41634
iamlukethedev wants to merge 1 commit into
NousResearch:mainfrom
iamlukethedev:fix/41607-compression-stale-instructions

Conversation

@iamlukethedev

Copy link
Copy Markdown

Fixes #41607

Problem: After context compression, stale instructions from the summary (especially '## Pending User Asks' and '## Remaining Work' sections) were being executed as current tasks when they had topical overlap with the latest user message, even though the preamble said to discard them.

Root Cause: The previous preamble's soft language ('treat as background reference') was insufficient. When summary items had topical overlap with the latest message, models would conflate stale summary items with current context.

Solution: Implemented all 4 suggestions from the issue:

  1. Strengthened SUMMARY_PREFIX preamble with explicit 'STALE — DISCARD' language

    • ⚠️ CRITICAL RULE section clearly stating all summary items are stale
    • Bullet points listing dangerous section names
    • Explicit: 'regardless of topical overlap with the latest user message'
  2. Added negative examples showing before/after for three scenarios

    • Auth fix vs auth refactor question (same topic, different task)
    • API endpoint implementation vs spec reasoning correction
    • In-progress work vs topic change
  3. Clarified memory vs summary distinction

    • Memory (MEMORY.md, USER.md) = eternal, authoritative
    • Summary sections = transient, must be discarded
  4. Renamed dangerous sections to prevent execution-as-task misreading

    • Fallback template: 'Pending User Asks' → 'Protected Context (Background Reference Only)'
    • LLM summarizer template: 'Pending User Asks' → 'Protected Context'
    • Removed 'Remaining Work' (fold into Critical Context)

Testing: Verified SUMMARY_PREFIX compiles and renders correctly. All changes are safe text updates to guidance and templates — no behavioral changes to compression logic itself.

Impact: Prevents the model from acting on stale summary items even when topical overlap makes them look relevant. Explicit negative examples make the intent unambiguous.

…revent stale instructions execution

Addresses issue where stale '## Pending User Asks' and '## Remaining Work'
sections in compression summaries were incorrectly executed as current tasks
when they had topical overlap with the latest user message.

Changes:
1. STRENGTHENED PREAMBLE: Rewrote SUMMARY_PREFIX with explicit 'STALE — DISCARD'
   language, replacing weak 'treat as background reference' with:
   - ⚠️ CRITICAL RULE section explicitly listing dangerous section names
   - Bullet points identifying stale items (Pending Asks, Remaining Work,
     In Progress, Blocked, Completed Actions)
   - Any phrasing that sounds like 'next steps' or 'to-do'

2. NEGATIVE EXAMPLES: Added three concrete before/after examples showing:
   - Wrong approach (acting on stale summary items)
   - Correct approach (discarding stale items, responding to latest message)
   - Covers auth, API endpoint, and multi-priority scenarios

3. CLARITY ON MEMORY vs SUMMARY: Explicitly contrasted:
   - Memory (MEMORY.md, USER.md) = eternal, authoritative
   - Summary sections = transient, must be discarded

4. RENAMED DANGEROUS SECTIONS in fallback template:
   - '## Pending User Asks' → '## Protected Context (Background Reference Only)'
   - '## Remaining Work' → '## Session Context (fallback only)'

5. UPDATED LLM SUMMARIZER TEMPLATE to prevent creation of dangerous sections:
   - Replaced '## Pending User Asks' section with '## Protected Context'
   - Removed '## Remaining Work' (fold into Critical Context)
   - Added explicit instruction: 'Do NOT phrase as pending or remaining —
     phrase as completed background facts'
   - Added critical warning: 'Do not create a ## Pending User Asks section
     or anything that looks like unfinished work — the model will execute it'

Root cause: The previous preamble's soft language ('treat as background reference')
was insufficient when summary items had topical overlap with the latest message.
Models would conflate stale summary items with current context.

This fix makes the discard rule explicit, unavoidable, and surrounded by
concrete examples so the intent is unambiguous.

Fixes NousResearch#41607
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder duplicate This issue or pull request already exists labels Jun 8, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #41650 — both fix #41607 by strengthening the compression summary preamble in agent/context_compressor.py against stale-instruction execution. #41650 additionally includes regression tests (test_resume_stale_active_task.py, test_summary_prefix_semantics.py).

@iamlukethedev

Copy link
Copy Markdown
Author

@alt-glitch if this is a duplicate, should I close my PR?

@teknium1

Copy link
Copy Markdown
Contributor

Closing in favor of PR #44454 (merged), which combines #44345's heading constants with #41650's carveout removal plus a frozen-prefix fixup for backward compatibility. Your PR targeted the same stale-instruction class — thanks for the work and the candor about overlap; the negative-example ideas in your rewrite informed the review even though we went with the constant-based approach.

#44454

@teknium1 teknium1 closed this Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder duplicate This issue or pull request already exists P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compression summary stale instructions executed as current task

3 participants