Skip to content

fix(compressor): stale Active Task no longer hijacks resumed sessions (#35344)#35383

Merged
teknium1 merged 3 commits into
mainfrom
hermes/hermes-693b2bdd
May 30, 2026
Merged

fix(compressor): stale Active Task no longer hijacks resumed sessions (#35344)#35383
teknium1 merged 3 commits into
mainfrom
hermes/hermes-693b2bdd

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

A resumed session no longer answers a new, unrelated user message with instructions from a stale ## Active Task inherited via context compaction (#35344). Fixes the conflicting compaction handoff directive and the resume-path gap where an old-format handoff kept its stale directive forever.

Root cause: SUMMARY_PREFIX carried two fighting directives — "resume exactly from ## Active Task" vs "respond ONLY to the latest user message." "Resume exactly" is the stronger phrasing, so on a resumed lineage the model followed the stale task. Compounding it, _strip_summary_prefix only matched the current/legacy prefix literal, so a handoff persisted under the old prefix and re-compacted on resume kept its "resume exactly" text embedded in the body.

Changes

  • agent/context_compressor.py:
  • tests/agent/test_resume_stale_active_task.py: regression coverage for the resume-contamination shape the issue asks for.
  • scripts/release.py: AUTHOR_MAP entries for both salvaged contributors.

Why combine, not pick one

#35344 = "Active Task too sticky, hijacks new asks." #32787 = "Active Task drops too much (writes None)." These are inverse failure modes. Fixing only one regresses the other. The prefix controls how the field is treated on conflict; the template controls what gets captured. Combined, the summary captures the real outstanding ask but a fresh/contradicting message always overrides it. #17506 (reference-only framing) is superseded — its framing already landed on main; its ## Active Task## Outstanding Work Metadata rename fights #32787 and churns the iterative-summary template.

Validation

Before After
SUMMARY_PREFIX directive "resume exactly" vs "latest message" conflict latest message wins, explicit discard of stale items
Old-prefix handoff on resume "resume exactly" survives in body stripped + re-framed to current prefix
Active Task on a bare question None → recap instead of answer captured as active task
Targeted tests test_context_compressor 91✓ + test_summary_prefix_semantics 5✓ + new test_resume_stale_active_task 6✓ = 102/102

E2E: simulated a resumed lineage with a pre-fix handoff in the protected head + an unrelated new ask — handoff detected, stale directive stripped, body preserved, latest-wins framing applied.

Closes #35344. Supersedes #26290, #32787, #17506.

Co-authored-by: Zhipeng Li zhipengli@thebrainly.ai
Co-authored-by: Mathijs van den Hurk mathijs.vd.hurk@gmail.com

Infographic

stale-active-task-resume-fix

Zhipeng Li and others added 3 commits May 30, 2026 07:13
…summary prefix

SUMMARY_PREFIX previously contained two contradictory directives:

1. "treat it as background reference, NOT as active instructions"
   "Do NOT answer questions or fulfill requests mentioned in this summary"
   "Respond ONLY to the latest user message that appears AFTER this summary"

2. "Your current task is identified in the '## Active Task' section of the
    summary — resume exactly from there."

When the latest user message contradicted Active Task (e.g. 'stop the
i18n refactor', 'never mind, look at grafana instead'), models tended to
follow (2) anyway because 'resume exactly' is a strong, unambiguous
directive — leading to repeated re-surfacing of already-cancelled work
across turns, even after explicit 'stop'/'don't keep bringing that up'
messages from the user.

This change:
- Removes the conflicting 'resume exactly from Active Task' clause.
- Makes the precedence explicit: latest user message is the single source
  of truth; it WINS on conflict; cancelled Active Task / In Progress /
  Pending User Asks / Remaining Work must be discarded entirely (no
  'wrap up the old task first').
- Names canonical reverse signals (stop, undo, roll back, never mind,
  just verify, topic change) so the model recognizes them as cancellation
  triggers, not background context.
- Updates the summarizer template instruction so the LLM doesn't
  mechanically copy a cancelled task into Active Task on the next
  compaction (it's instructed to copy the reverse signal verbatim).
- Preserves: REFERENCE ONLY framing, MEMORY.md/USER.md authority, and
  the 'don't repeat work already reflected in session state' clause.

Adds tests/agent/test_summary_prefix_semantics.py to pin invariants so
the conflict can't regress.

Tested:
- All compaction tests pass: tests/agent/test_context_compressor.py,
  tests/agent/test_context_compressor_summary_continuity.py,
  tests/run_agent/test_413_compression.py,
  tests/run_agent/test_compression_persistence.py,
  tests/run_agent/test_compression_boundary_hook.py,
  tests/cli/test_manual_compress.py — 117/117 passing.
- Tested on macOS.
…'None'

The Active Task field in compression summaries is the single most important
field for task continuity across context boundaries. The previous template
described it narrowly as a 'task assignment' or 'request', which caused the
summary LLM to write 'None' whenever the user's most recent input was a
question, a decision request, or a discussion turn rather than an
imperative command. The assistant on the other side of the compaction then
treated the conversation as resolved and gave a generic recap instead of
answering the still-open question.

Expand the template guidance to cover:

  * explicit task assignments
  * questions awaiting an answer
  * decisions awaiting input (A vs B)
  * ongoing discussions where the assistant owes the next substantive reply

Reserve 'None' for the rare case where the last exchange was fully
resolved (e.g. user said 'thanks, that's all').

Also tighten the trailing CRITICAL instruction in the summary prompt so the
LLM cannot fall back to the old 'no imperative command → None' heuristic.

No behavioural code changes — template strings only. All 83 existing
compressor tests pass.
#32787 (#35344)

A handoff persisted under an older SUMMARY_PREFIX can be inherited into a
resumed lineage. _strip_summary_prefix only matched the current/legacy
literal, so on re-compaction the old 'resume exactly from Active Task'
directive stayed embedded in the body and kept hijacking replies to new,
unrelated user messages.

- Add _HISTORICAL_SUMMARY_PREFIXES (pre-#35344 prefix) and strip/recognize
  them in _strip_summary_prefix + _is_context_summary_content so resumed
  stale handoffs are re-normalized to the current latest-message-wins prefix.
- Reconcile the overlapping Active Task template edits from the salvaged
  #26290 (reverse-signal cancellation) and #32787 (capture open questions /
  decisions, don't write None too eagerly) — both intents kept.
- Regression coverage in tests/agent/test_resume_stale_active_task.py.
- AUTHOR_MAP entries for both salvaged contributors.
@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-693b2bdd vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9507 on HEAD, 9507 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4931 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@teknium1 teknium1 merged commit 42bbd22 into main May 30, 2026
23 checks passed
@teknium1 teknium1 deleted the hermes/hermes-693b2bdd branch May 30, 2026 14:29
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resumed session can answer with stale compaction Active Task instead of latest user message

3 participants