Skip to content

fix(memory): skip external-provider sync on interrupted turns (#15218)#15395

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-df9a563a
Apr 24, 2026
Merged

fix(memory): skip external-provider sync on interrupted turns (#15218)#15395
teknium1 merged 1 commit into
mainfrom
hermes/hermes-df9a563a

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Salvage of #15249 by @briandevans (authorship preserved via cherry-pick).

Summary

External memory providers no longer persist interrupted turns as if they were completed exchanges. Previously, the sync block at the end of run_conversation was gated on memory_manager and final_response and original_user_message but not on interrupted — inconsistent with the background-review block four lines below which already checks not interrupted.

Root cause

When a user interrupts mid-turn, final_response is either partial assistant output or the literal string "Operation interrupted: waiting for model response (Ns elapsed).". Either way, mirroring that into an external memory backend poisons downstream recall.

Changes

  • run_agent.py: extract inline sync block into AIAgent._sync_external_memory_for_turn(...) that returns early when interrupted=True
  • tests/run_agent/test_memory_sync_interrupted.py: 16 tests covering interrupt guard, positive path, edge cases, and best-effort exception swallowing

Validation

Before After
Interrupted turn with partial response synced to external memory skipped
Interrupted turn with "Operation interrupted…" string synced as assistant answer skipped
Completed turn synced + prefetched unchanged
No memory manager / no response / no user msg skipped skipped
Backend raises swallowed swallowed
Test suite N/A 16/16 pass

Closes #15218. Closes #15249 (original PR by @briandevans — commit cherry-picked with authorship preserved).

``run_conversation`` was calling ``memory_manager.sync_all(
original_user_message, final_response)`` at the end of every turn
where both args were present.  That gate didn't consider the
``interrupted`` local flag, so an external memory backend received
partial assistant output, aborted tool chains, or mid-stream resets as
durable conversational truth.  Downstream recall then treated the
not-yet-real state as if the user had seen it complete, poisoning the
trust boundary between "what the user took away from the turn" and
"what Hermes was in the middle of producing when the interrupt hit".

Extracted the inline sync block into a new private method
``AIAgent._sync_external_memory_for_turn(original_user_message,
final_response, interrupted)`` so the interrupt guard is a single
visible check at the top of the method instead of hidden in a
boolean-and at the call site.  That also gives tests a clean seam to
assert on — the pre-fix layout buried the logic inside the 3,000-line
``run_conversation`` function where no focused test could reach it.

The new method encodes three independent skip conditions:

  1. ``interrupted`` → skip entirely (the #15218 fix).  Applies even
     when ``final_response`` and ``original_user_message`` happen to
     be populated — an interrupt may have landed between a streamed
     reply and the next tool call, so the strings on disk are not
     actually the turn the user took away.
  2. No memory manager / no final_response / no user message →
     preserve existing skip behaviour (nothing new for providerless
     sessions, system-initiated refreshes, tool-only turns that never
     resolved, etc.).
  3. Sync_all / queue_prefetch_all exceptions → swallow.  External
     memory providers are strictly best-effort; a misconfigured or
     offline backend must never block the user from seeing their
     response.

The prefetch side-effect is gated on the same interrupt flag: the
user's next message is almost certainly a retry of the same intent,
and a prefetch keyed on the interrupted turn would fire against stale
context.

### Tests (16 new, all passing on py3.11 venv)

``tests/run_agent/test_memory_sync_interrupted.py`` exercises the
helper directly on a bare ``AIAgent`` (``__new__`` pattern that the
interrupt-propagation tests already use).  Coverage:

- Interrupted turn with full-looking response → no sync (the fix)
- Interrupted turn with long assistant output → no sync (the interrupt
  could have landed mid-stream; strings-on-disk lie)
- Normal completed turn → sync_all + queue_prefetch_all both called
  with the right args (regression guard for the positive path)
- No final_response / no user_message / no memory manager → existing
  pre-fix skip paths still apply
- sync_all raises → exception swallowed, prefetch still attempted
- queue_prefetch_all raises → exception swallowed after sync succeeded
- 8-case parametrised matrix across (interrupted × final_response ×
  original_user_message) asserts sync fires iff interrupted=False AND
  both strings are non-empty

Closes #15218

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@teknium1 teknium1 merged commit 00c3d84 into main Apr 24, 2026
10 of 12 checks passed
@teknium1 teknium1 deleted the hermes/hermes-df9a563a branch April 24, 2026 22:30
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder tool/memory Memory tool and memory providers labels Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround tool/memory Memory tool and memory providers type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

External memory sync should skip interrupted turns

3 participants