Skip to content

State.db loses assistant responses when _drop_trailing_empty_response_scaffolding causes _last_flushed_db_idx to overshoot len(messages) #31507

@529349029

Description

@529349029

Bug Description

When using Hermes CLI agent across multiple conversation turns (same AIAgent instance reused), state.db intermittently loses assistant response messages. User and tool messages are persisted correctly, but some assistant responses are silently dropped.

Root Cause

File: run_agent.py, method _flush_messages_to_session_db (line ~1257)

The incremental flush logic computes:

start_idx = len(conversation_history) if conversation_history else 0
flush_from = max(start_idx, self._last_flushed_db_idx)
for msg in messages[flush_from:]:
    # ... write to state.db ...
self._last_flushed_db_idx = len(messages)

And _persist_session calls _drop_trailing_empty_response_scaffolding(messages) BEFORE _flush_messages_to_session_db. This can pop trailing scaffolding/empty messages from the messages list, reducing len(messages).

But _last_flushed_db_idx was already set to the OLD (higher) length by a previous intermediate _persist_session call during tool processing. After the pop:

  • flush_from = max(start_idx, old_last_flushed_db_idx) → still the old high value
  • flush_from >= len(messages)messages[flush_from:] is empty
  • No messages are written at all — including the final assistant response

Trigger Condition

  1. Agent processes user message with multiple tool calls
  2. Intermediate _persist_session calls update _last_flushed_db_idx to len(messages_at_that_time) (e.g., 98)
  3. Agent generates final assistant response
  4. _drop_trailing_empty_response_scaffolding pops 1-2 trailing messages from messages (len drops to e.g. 97)
  5. _flush_messages_to_session_db computes flush_from = max(start_idx, 98) >= 97 → skips everything

Evidence

Session mpjo2msf4kzhz9 (source=cli) in state.db:

  • user=26, assistant=123, tool=114 (missing ~43 assistant responses)
  • First 17 turns: all messages complete
  • Last 3 turns: user+tool written, but assistant responses missing
  • The same session in the WebUI local DB (~/.hermes/webui/hermes-web-ui.db): user=26, assistant=166, tool=122 — complete

Fix

Add a guard in _flush_messages_to_session_db to reset flush_from when it overshoots:

start_idx = len(conversation_history) if conversation_history else 0
flush_from = max(start_idx, self._last_flushed_db_idx)
# Fix: if _last_flushed_db_idx overshoots len(messages) after
# _drop_trailing_empty_response_scaffolding popped trailing messages,
# reset to start_idx so new messages are not skipped.
if flush_from >= len(messages):
    flush_from = start_idx
for msg in messages[flush_from:]:
    ...

Environment

  • Hermes Agent: local git checkout
  • Session source: cli (gateway/CLI path with reused agent)
  • State DB: SQLite (~/.hermes/state.db)
  • This only affects sessions where the same AIAgent instance is reused across turns (CLI sessions). Gateway sessions that create a new AIAgent per turn are unaffected because _last_flushed_db_idx starts at 0 each time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions