Skip to content

fix(agent): persist delivered assistant text on interrupted turns#43962

Open
konsisumer wants to merge 1 commit into
NousResearch:mainfrom
konsisumer:fix/persist-interrupted-assistant-transcript
Open

fix(agent): persist delivered assistant text on interrupted turns#43962
konsisumer wants to merge 1 commit into
NousResearch:mainfrom
konsisumer:fix/persist-interrupted-assistant-transcript

Conversation

@konsisumer

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes a transcript-persistence gap where Hermes can deliver assistant text to the user before the canonical assistant message is appended to the in-memory transcript. If the turn is then interrupted, or a partial stream is recovered without a final assistant append, state.db could miss text that was already shown to the user. This change makes turn finalization backfill that visible assistant text into the transcript before persistence when it is missing from the current turn.

Related Issue

Fixes #43936

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • agent/turn_finalizer.py: before persisting the turn, detect visible assistant text that was already delivered for partial_stream_recovery or an interrupted streamed turn, and append a synthetic assistant transcript row only when the current turn does not already contain that text.
  • tests/run_agent/test_run_agent.py: added regression coverage for recovered partial-stream persistence and interrupt-after-delivery persistence, alongside the existing partial-stream recovery tests.

How to Test

  1. Run TMP_HERMES_HOME=$(mktemp -d /private/tmp/hermes-test-XXXXXX) && HERMES_HOME="$TMP_HERMES_HOME" /opt/homebrew/bin/timeout -k 30 480 pytest tests/run_agent/test_run_agent.py -q -k 'partial_stream_recovery_persists_recovered_assistant_message or interrupted_stream_persists_visible_assistant_text or partial_stream_recovery_uses_streamed_content or partial_stream_recovery_on_empty_stub or partial_stream_recovery_preempts_prior_turn_fallback'.
  2. Confirm the targeted run passes and that the new persistence assertions show the last persisted message is the delivered assistant text.
  3. Attempt the broad suite with TMP_HERMES_HOME=$(mktemp -d /private/tmp/hermes-test-XXXXXX) && HERMES_HOME="$TMP_HERMES_HOME" /opt/homebrew/bin/timeout -k 30 480 sh -c 'pytest tests/ -q -x --timeout=60 "$@"' sh.
  4. In this sandbox, note that the broad run stops during collection on an unrelated missing dependency: ModuleNotFoundError: No module named 'fastapi' from tests/hermes_cli/test_dashboard_auth_401_reauth.py.
  5. Run python scripts/check-windows-footguns.py agent/turn_finalizer.py tests/run_agent/test_run_agent.py, ruff check agent/turn_finalizer.py tests/run_agent/test_run_agent.py, and git diff --check.

What platforms tested on

  • macOS on darwin-arm64 (local sandbox)

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS / darwin-arm64

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

  • Broad suite attempt stopped early in this environment on ModuleNotFoundError: No module named 'fastapi' while collecting tests/hermes_cli/test_dashboard_auth_401_reauth.py.
  • Scoped regression tests, ruff check, git diff --check, and the Windows footguns scan passed.

@liuhao1024

Copy link
Copy Markdown
Contributor

Verification review — reviewed the diff for correctness.

What I checked:

  1. Dedup logic — the backward scan through messages correctly stops at the first user role message, preventing false matches from earlier turns. The normalization via _normalize_interim_visible_text ensures whitespace differences don't cause duplicate appends.
  2. Two recovery pathspartial_stream_recovery uses final_response while interrupted uses _current_streamed_assistant_text (stripped of think blocks). Both paths are correctly gated.
  3. Append-before-persist ordering — the new assistant message is appended to messages before _persist_session is called, so the persisted session includes the recovered text.
  4. Test coverage — both partial_stream_recovery and interrupted paths have dedicated tests that verify the persisted messages contain the expected assistant text. copy.deepcopy in the capture mock is correct for snapshot isolation.

Verdict: Clean. The fix correctly addresses the data loss scenario where interrupted or partially-recovered streams leave no assistant message in the persisted session.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Jun 11, 2026
@liuhao1024

Copy link
Copy Markdown
Contributor

Verified clean — LGTM

Reviewed the diff: turn_finalizer.py persist-on-interrupt fix + 2 tests.

What I checked:

  • Dedup loop correctness: Iterates backward through messages, checks each assistant message against normalized visible text. Stops at first user message boundary (prevents matching a stale assistant message from a previous turn). Uses _normalize_interim_visible_text + _strip_think_blocks for robust comparison. ✓
  • Insertion timing: _visible_assistant_text is appended to messages before _persist_session(messages) is called. This ensures the persisted conversation includes the interrupted/visible text. ✓
  • Two paths covered:
    • partial_stream_recovery → uses final_response directly
    • interrupted → uses _current_streamed_assistant_text (stripped of think blocks)
    • Neither path → _visible_assistant_text is None → no append. ✓
  • No double-persist: If the assistant message is already in messages (e.g., from a previous recovery), the dedup loop sets _already_persisted = True and skips the append. ✓
  • Test coverage: test_partial_stream_recovery_persists_recovered_assistant_message and test_interrupted_stream_persists_visible_assistant_text both verify the persisted message is the last entry. ✓

Edge case note: The _drop_trailing_empty_response_scaffolding call happens before the dedup check. If scaffolding removes an empty assistant message that happens to match, the dedup correctly re-appends the real content. This is the right order.

@Willhong

Copy link
Copy Markdown

Heads up that this overlaps in intent with #45277 (which I just opened for #43936) but they fix different layers and are complementary — flagging so a reviewer can consider them together.

  • This PR (fix(agent): persist delivered assistant text on interrupted turns #43962) backfills already-delivered visible assistant text in turn_finalizer.py when messages is missing the assistant row (interrupt / partial_stream_recovery). It fixes the case where the row never made it into the turn's message list.
  • fix(agent): identity-based session-DB flush — stop positional assistant drops (#43936) #45277 fixes the case where the assistant row is present in messages but _flush_messages_to_session_db skips it. The flush was position-based (messages[max(start_idx, _last_flushed_db_idx):]); two divergence paths (a cursor corrupted by overlapping turns, and conversation_history growing longer than messages after repair_message_sequence shrinks a drop-poisoned history) make the slice land past the end, so a completed, delivered assistant response is silently dropped. It replaces the slice with identity-based dedup.

They don't conflict: a row this PR appends carries no _db_persisted stamp and isn't in _hist_ids, so #45277's flush writes it exactly once. And the divergence path #45277 covers (a non-interrupt, repair-shrink loop on a turn that did append its assistant row) is outside this PR's interrupted / partial_stream_recovery conditions, so the two cover the prevention and the recovery sides of #43936 respectively. Either can land independently.

Willhong pushed a commit to Willhong/hermes-agent that referenced this pull request Jun 13, 2026
…nt drops (NousResearch#43936)

The session-DB flush in `_flush_messages_to_session_db` was position-based
(`messages[max(start_idx, _last_flushed_db_idx):]`). It assumes
`messages == conversation_history + this turn's new messages`, which breaks two
ways, both reproduced live:

1. Overlapping turns on the cached agent corrupt the shared
   `_last_flushed_db_idx` (it indexes the turn-local `messages` array) → the
   earlier completed turn flushes an empty slice and its delivered assistant
   row is never written.
2. `repair_message_sequence` compacts `messages` in place below
   `len(conversation_history)`, so `start_idx > len(messages)` permanently →
   self-reinforcing drop loop.

The merged NousResearch#44837 fix (NousResearch#45260) clamps `_last_flushed_db_idx` to `len(messages)`
but leaves `start_idx = len(conversation_history)` unbounded, so shape 2 is
still an empty slice. No index arithmetic survives the two arrays diverging.

Replace the positional slice with identity-based dedup: stamp each persisted
dict with `_db_persisted`, recognize this turn's `conversation_history` entries
by object identity (`id()`) and stamp instead of re-write. A message is written
exactly once regardless of how the arrays shift; the NousResearch#860 and NousResearch#31507 guards
hold by construction. `_db_persisted` is underscore-prefixed and already
stripped from API payloads. The compression split clears the stamps so the
surviving messages re-write into the new session row. `_last_flushed_db_idx` is
kept updated for legacy readers but no longer decides what is written.

Complementary to NousResearch#43962 (backfills delivered text after an interrupt in
turn_finalizer.py); this prevents the drop at the flush layer and also covers
the non-interrupt repair-shrink path.

Regression: tests/run_agent/test_identity_flush.py (5, incl. an explicit
test that the NousResearch#44837 clamp still drops where identity flush persists) +
test_message_sequence_repair, test_860_dedup, test_compression_persistence all
GREEN.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: State.db drops assistant messages on interrupt; .jsonl session log no longer available as fallback

4 participants