fix(gateway,agent): persist delivered responses that recovery paths drop from the transcript by AIalliAI · Pull Request #44120 · NousResearch/hermes-agent

AIalliAI · 2026-06-11T08:13:03Z

Problem

Gateway delivers assistant responses to the platform (confirmed in gateway.log), but the session DB ends up with no assistant rows between the user messages. When the next message arrives, the model loads a transcript full of "unanswered" user messages and re-answers all of them in one turn.

Root cause

Two pieces interact:

Agent — partial-stream recovery drops the assistant turn. In agent/conversation_loop.py, when the final assembled assistant message has no visible content but text was already streamed to the user, the recovery path sets final_response from the streamed text and breaks without appending an assistant message to messages. The turn-end _persist_session() then flushes a transcript whose tail is the user message — the user row survives (written by the turn-start crash-resilience flush), the assistant row never exists. This matches the issue's evidence exactly: response ready (…, 48 chars) logs a non-empty final_response while the DB has zero assistant rows.
Gateway — every fallback write is a silent no-op. Since state.db became the canonical transcript store (spec 002), append_to_transcript(..., skip_db=True) does nothing at all. The gateway skips all post-turn DB writes via skip_db=agent_persisted to avoid the bug: SQLite session transcript accumulates duplicate messages (3-4x token inflation) #860/Bug: User messages stored twice in state.db when agent and gateway both write to SQLite #42039 duplicate-write bug — correct for messages the agent flushed, but it means the gateway cannot backfill anything the agent's flush missed. The delivered response is silently dropped with no error anywhere.

Fix

One invariant, enforced at both layers: a delivered final_response must end up in the session transcript.

agent/conversation_loop.py: the partial-stream recovery path appends the recovered text as a real assistant turn before breaking, so _persist_session() writes it and role alternation is preserved.
gateway/run.py: when the turn's new messages contain no assistant text but a response was delivered, the gateway backfills the assistant row with skip_db=False. A response generated this turn cannot already be in the loaded history, so this cannot double-write; the bug: SQLite session transcript accumulates duplicate messages (3-4x token inflation) #860/Bug: User messages stored twice in state.db when agent and gateway both write to SQLite #42039 protections for user entries and agent-flushed messages are untouched (pinned by regression tests). Same reasoning for the existing not new_messages fallback branch, whose assistant write was also a no-op.

The gateway backfill also covers the fallback_prior_turn_content recovery (response sourced from an earlier tool-call turn's content, transcript tail ends at a tool message) and any future agent path that returns a response without representing it in messages — with an INFO log so occurrences are visible instead of silent.

Tests

tests/run_agent/test_44100_partial_recovery_persistence.py — partial-stream recovery appends the recovered assistant turn; normal turns unchanged (exactly one assistant message).
tests/gateway/test_44100_assistant_backfill.py — backfill fires when the turn has no assistant text (plain and tool-call turns), does NOT fire when the agent persisted the message itself (bug: SQLite session transcript accumulates duplicate messages (3-4x token inflation) #860/Bug: User messages stored twice in state.db when agent and gateway both write to SQLite #42039 protection intact), and the not new_messages fallback writes with skip_db=False.

tests/gateway/ + the neighboring run_agent persistence/streaming suites pass; the handful of failures present are identical with and without this diff (pre-existing, environment-dependent: shutdown forensics/systemd, Telegram MarkdownV2 escaping).

Fixes #44100

…rop from the transcript Gateway delivered assistant responses to the platform but never persisted them to the session DB, so the model saw consecutive "unanswered" user messages and re-answered all of them on the next turn (NousResearch#44100). Two layers, one invariant — a delivered final_response must end up in the session transcript: 1. agent: the partial-stream recovery path (final message empty/thinking- only but content already streamed to the user) set final_response and broke out of the loop WITHOUT appending an assistant message. The turn-end _persist_session then wrote no assistant row — only the user message (persisted by the turn-start crash-resilience flush) survived. Append the recovered text as a real assistant turn before breaking. 2. gateway: state.db is the canonical transcript store (spec 002), so append_to_transcript(..., skip_db=True) is a complete no-op — the gateway's "fallback" writes could never backfill anything. When a turn's new messages contain no assistant text but a response was delivered, write the assistant row with skip_db=False. A response generated this turn cannot already be in the loaded history, so the NousResearch#860/NousResearch#42039 duplicate-write protection (which concerns the user entry and agent-flushed messages) is preserved — covered by regression tests. Fixes NousResearch#44100 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

liuhao1024 · 2026-06-11T09:01:18Z

Verification: thorough fix for transcript persistence gap with comprehensive regression tests

Reviewed both the conversation_loop.py and gateway/run.py changes — the fix correctly addresses #44100 from two complementary angles:

Agent-side fix (conversation_loop.py): When partial-stream recovery fires (empty final message but content already streamed), the recovered text is now appended as a real assistant message before break. Without this, _persist_session writes no assistant row and the transcript loses role alternation.

Gateway-side fix (gateway/run.py): Three changes:

skip_db=agent_persisted → skip_db=False for the assistant response write in the new_messages path — a response generated this turn cannot be in loaded history, so there's no duplicate risk
A safety-net backfill when the turn's new_messages contain no assistant text — catches any recovery path the agent-side fix might miss
The not new_messages fallback also uses skip_db=False

The safety-net check _turn_has_assistant_text correctly filters for role == "assistant" with non-empty content and no tool_calls, so it won't false-positive on tool-call assistant messages.

Test coverage is strong: 4 gateway tests (backfill/no-backfill/tool-turn/fallback) + 2 agent tests (recovery persistence/normal control). The _bootstrap helper correctly mocks the session store to verify skip_db values.

AIalliAI · 2026-06-12T10:00:21Z

Requesting maintainer review — this is ready to land from my side. Standalone fork CI is pending first-run approval here; the rollup branch in #44061 carrying this session's batch is fully green on upstream CI (all test shards, typecheck, e2e).

AIalliAI mentioned this pull request Jun 11, 2026

Telegram: assistant responses not persisted to session DB (model re-answers old messages) #44100

Open

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder comp/gateway Gateway runner, session dispatch, delivery labels Jun 11, 2026

AIalliAI mentioned this pull request Jun 11, 2026

Bugfix rollup (2026-06-10 session): cron, state DB, agent loop, MCP bridge, gateway, desktop, Windows #44061

Open

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway,agent): persist delivered responses that recovery paths drop from the transcript#44120

fix(gateway,agent): persist delivered responses that recovery paths drop from the transcript#44120
AIalliAI wants to merge 1 commit into
NousResearch:mainfrom
AIalliAI:fix/44100-persist-delivered-response

AIalliAI commented Jun 11, 2026 •

edited

Loading

Uh oh!

liuhao1024 commented Jun 11, 2026

Uh oh!

AIalliAI commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AIalliAI commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix

Tests

Uh oh!

liuhao1024 commented Jun 11, 2026

Uh oh!

AIalliAI commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AIalliAI commented Jun 11, 2026 •

edited

Loading