fix(compressor): stale Active Task no longer hijacks resumed sessions (#35344)#35383
Merged
Conversation
…summary prefix
SUMMARY_PREFIX previously contained two contradictory directives:
1. "treat it as background reference, NOT as active instructions"
"Do NOT answer questions or fulfill requests mentioned in this summary"
"Respond ONLY to the latest user message that appears AFTER this summary"
2. "Your current task is identified in the '## Active Task' section of the
summary — resume exactly from there."
When the latest user message contradicted Active Task (e.g. 'stop the
i18n refactor', 'never mind, look at grafana instead'), models tended to
follow (2) anyway because 'resume exactly' is a strong, unambiguous
directive — leading to repeated re-surfacing of already-cancelled work
across turns, even after explicit 'stop'/'don't keep bringing that up'
messages from the user.
This change:
- Removes the conflicting 'resume exactly from Active Task' clause.
- Makes the precedence explicit: latest user message is the single source
of truth; it WINS on conflict; cancelled Active Task / In Progress /
Pending User Asks / Remaining Work must be discarded entirely (no
'wrap up the old task first').
- Names canonical reverse signals (stop, undo, roll back, never mind,
just verify, topic change) so the model recognizes them as cancellation
triggers, not background context.
- Updates the summarizer template instruction so the LLM doesn't
mechanically copy a cancelled task into Active Task on the next
compaction (it's instructed to copy the reverse signal verbatim).
- Preserves: REFERENCE ONLY framing, MEMORY.md/USER.md authority, and
the 'don't repeat work already reflected in session state' clause.
Adds tests/agent/test_summary_prefix_semantics.py to pin invariants so
the conflict can't regress.
Tested:
- All compaction tests pass: tests/agent/test_context_compressor.py,
tests/agent/test_context_compressor_summary_continuity.py,
tests/run_agent/test_413_compression.py,
tests/run_agent/test_compression_persistence.py,
tests/run_agent/test_compression_boundary_hook.py,
tests/cli/test_manual_compress.py — 117/117 passing.
- Tested on macOS.
…'None' The Active Task field in compression summaries is the single most important field for task continuity across context boundaries. The previous template described it narrowly as a 'task assignment' or 'request', which caused the summary LLM to write 'None' whenever the user's most recent input was a question, a decision request, or a discussion turn rather than an imperative command. The assistant on the other side of the compaction then treated the conversation as resolved and gave a generic recap instead of answering the still-open question. Expand the template guidance to cover: * explicit task assignments * questions awaiting an answer * decisions awaiting input (A vs B) * ongoing discussions where the assistant owes the next substantive reply Reserve 'None' for the rare case where the last exchange was fully resolved (e.g. user said 'thanks, that's all'). Also tighten the trailing CRITICAL instruction in the summary prompt so the LLM cannot fall back to the old 'no imperative command → None' heuristic. No behavioural code changes — template strings only. All 83 existing compressor tests pass.
#32787 (#35344) A handoff persisted under an older SUMMARY_PREFIX can be inherited into a resumed lineage. _strip_summary_prefix only matched the current/legacy literal, so on re-compaction the old 'resume exactly from Active Task' directive stayed embedded in the body and kept hijacking replies to new, unrelated user messages. - Add _HISTORICAL_SUMMARY_PREFIXES (pre-#35344 prefix) and strip/recognize them in _strip_summary_prefix + _is_context_summary_content so resumed stale handoffs are re-normalized to the current latest-message-wins prefix. - Reconcile the overlapping Active Task template edits from the salvaged #26290 (reverse-signal cancellation) and #32787 (capture open questions / decisions, don't write None too eagerly) — both intents kept. - Regression coverage in tests/agent/test_resume_stale_active_task.py. - AUTHOR_MAP entries for both salvaged contributors.
Contributor
🔎 Lint report:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A resumed session no longer answers a new, unrelated user message with instructions from a stale
## Active Taskinherited via context compaction (#35344). Fixes the conflicting compaction handoff directive and the resume-path gap where an old-format handoff kept its stale directive forever.Root cause:
SUMMARY_PREFIXcarried two fighting directives — "resume exactly from## Active Task" vs "respond ONLY to the latest user message." "Resume exactly" is the stronger phrasing, so on a resumed lineage the model followed the stale task. Compounding it,_strip_summary_prefixonly matched the current/legacy prefix literal, so a handoff persisted under the old prefix and re-compacted on resume kept its "resume exactly" text embedded in the body.Changes
agent/context_compressor.py:None. Reconciled with the prefix change — both intents kept._HISTORICAL_SUMMARY_PREFIXES+ updated_strip_summary_prefix/_is_context_summary_contentso a handoff persisted under an old prefix is detected and re-normalized to the current latest-message-wins framing on re-compaction.tests/agent/test_resume_stale_active_task.py: regression coverage for the resume-contamination shape the issue asks for.scripts/release.py: AUTHOR_MAP entries for both salvaged contributors.Why combine, not pick one
#35344 = "Active Task too sticky, hijacks new asks." #32787 = "Active Task drops too much (writes None)." These are inverse failure modes. Fixing only one regresses the other. The prefix controls how the field is treated on conflict; the template controls what gets captured. Combined, the summary captures the real outstanding ask but a fresh/contradicting message always overrides it. #17506 (reference-only framing) is superseded — its framing already landed on main; its
## Active Task→## Outstanding Work Metadatarename fights #32787 and churns the iterative-summary template.Validation
SUMMARY_PREFIXdirectiveNone→ recap instead of answertest_context_compressor91✓ +test_summary_prefix_semantics5✓ + newtest_resume_stale_active_task6✓ = 102/102E2E: simulated a resumed lineage with a pre-fix handoff in the protected head + an unrelated new ask — handoff detected, stale directive stripped, body preserved, latest-wins framing applied.
Closes #35344. Supersedes #26290, #32787, #17506.
Co-authored-by: Zhipeng Li zhipengli@thebrainly.ai
Co-authored-by: Mathijs van den Hurk mathijs.vd.hurk@gmail.com
Infographic