fix(agent): identity-based session-DB flush — stop positional assistant drops (#43936) by Willhong · Pull Request #45277 · NousResearch/hermes-agent

Willhong · 2026-06-12T23:27:57Z

What does this PR do?

Fixes assistant messages that are delivered to the chat but silently dropped
from state.db, so the next turn replays a user-only backlog and loses context
(#43936).

The root cause is that _flush_messages_to_session_db (in run_agent.py) was
position-based:

flush_from = max(start_idx, self._last_flushed_db_idx)   # start_idx = len(conversation_history)
for msg in messages[flush_from:]:
    self._session_db.append_message(...)

This assumes messages == conversation_history + this turn's new messages, so
slicing at len(conversation_history) isolates the new tail. That assumption
breaks in two real, independently-reproduced ways:

Overlapping turns corrupt the shared cursor. _last_flushed_db_idx lives
on the cached agent instance, but it indexes the turn-local messages
array that every turn rebuilds. When a second inbound message interrupts an
in-flight turn on the same cached agent (the exact repro in [Bug]: State.db drops assistant messages on interrupt; .jsonl session log no longer available as fallback #43936 — "send
another message before the agent finishes"), the later turn advances
_last_flushed_db_idx past the earlier turn's len(messages). The earlier,
now-completed turn's final flush then computes flush_from > len(messages),
the slice is empty, and a fully-generated, already-delivered assistant
response is never written.
conversation_history ends up longer than messages. Once a drop has
poisoned the transcript (consecutive-user / orphan-tool rows), the next
turn's repair_message_sequence merges/drops those rows and shrinks
messages below len(conversation_history). Now start_idx > len(messages)
permanently, every later flush slices past the end, and the session
enters a self-reinforcing drop loop. No interrupt is needed to keep it going.

No index arithmetic survives the two arrays diverging, so this replaces the
positional slice with identity-based dedup.

Re: the already-merged #44837 fix (#45260). #45260 changed the cursor to
flush_from = max(start_idx, min(self._last_flushed_db_idx, len(messages))).
That clamps _last_flushed_db_idx to len(messages) but leaves
start_idx = len(conversation_history) unbounded. For the live shape in
trigger #2 (start_idx=153, len(messages)=145), flush_from is still
max(153, ≤145) = 153 ≥ len(messages) → the slice is still empty and the
assistant is still dropped. So #45260 reduces the symptom but does not close
trigger #2 (nor trigger #1). This PR rebases on top of #45260 and replaces the
positional clamp entirely; tests/run_agent/test_identity_flush.py:: test_supersedes_44837_clamp_residual_drop pins the residual drop and proves
the identity flush persists where the clamp doesn't.

The fix is the minimal flush-layer change, not the issue's suggested ".jsonl
fallback restore" — restoring a second log would paper over the drop instead of
preventing it, and the same repair-shrink path (trigger #2) would still desync
both stores.

Related Issue

Fixes #43936

Related to #44837 (the repair-compaction drop), whose merged fix #45260 reduced
but did not eliminate the drop — see the note above; this PR rebases on #45260
and supersedes its positional clamp.

Complementary to (not overlapping with) #43962, which backfills delivered text
after an interrupt in turn_finalizer.py. This PR prevents the drop at the
flush layer, which also covers the non-interrupt trigger #2 that a finalizer
backfill cannot reach. The two can land independently.

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

run_agent.py — _flush_messages_to_session_db now stamps each persisted
dict with _db_persisted after its DB write, and recognizes this turn's
conversation_history entries by object identity (id()), stamping them
instead of re-writing. A message is written exactly once regardless of how the
arrays shift. _db_persisted is underscore-prefixed and already stripped from
API payloads (chat_completion_helpers.py, anthropic_adapter.py; the Codex
Responses adapter builds fresh items), so it never reaches a provider.
_last_flushed_db_idx is kept updated for legacy readers (cli.py
resume/branch) but no longer decides what gets written.
agent/conversation_compression.py — the compression session split clears the
_db_persisted stamps from the surviving messages so they re-write into the
new session row.
tests/run_agent/test_identity_flush.py — new regression suite (4 cases).
tests/run_agent/test_compression_persistence.py —
test_flush_with_stale_history_loses_messages previously asserted the bug
condition (0 rows persisted); it now asserts the fix (both messages persist
despite a stale, longer conversation_history).

How to Test

Reproduction (matches #43936):

Run the gateway in any chat platform with a single agent.
Send a message; let the agent begin responding.
Before it finishes, send a second message (overlapping turn).
Before: the first response appears in chat but has no assistant row in
state.db (gap; next turn replays a user-only backlog).
After: the assistant row is persisted; no gap.

Automated:

scripts/run_tests.sh tests/run_agent/

tests/run_agent/test_identity_flush.py encodes both triggers:

test_overlap_corrupted_cursor_does_not_drop_assistant (trigger 1)
test_repair_shrunk_messages_still_persist_new_turn (trigger 2)
test_repeated_flush_same_turn_writes_once (bug: SQLite session transcript accumulates duplicate messages (3-4x token inflation) #860 dedup still holds)
test_history_dicts_not_rewritten (no duplication of history rows)

Verified live on the gateway: a 99-second, 6-API-call multi-tool turn with two
overlapping inbound messages (the exact shape that dropped a 1096-char response
before this change) persisted every assistant/tool row, and the
FLUSH-IDENTITY log line fired on the len_conv > len_messages divergence with
assistant_written=True.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(agent): …)
I searched for existing PRs (found fix(agent): persist delivered assistant text on interrupted turns #43962; described the relationship above)
My PR contains only changes related to this fix
I've run scripts/run_tests.sh tests/run_agent/ and all tests pass
I've added tests for my changes
I've tested on my platform: macOS 26.5 (Apple Silicon), Python 3.11

Documentation & Housekeeping

Documentation update — N/A (internal persistence path, no user-facing config)
cli-config.yaml.example — N/A (no config keys added)
CONTRIBUTING.md / AGENTS.md — N/A (no architecture/workflow change)
Cross-platform impact considered — the change is pure in-memory dict
bookkeeping + the existing SQLite path; no OS-specific behavior
Tool descriptions/schemas — N/A

Risk / rollout

Behavior-neutral on normal sequential turns: a fresh message dict is never in
_hist_ids and carries no stamp, so it is written exactly as before. The only
behavior change is on the divergence paths, which previously dropped data.

…nt drops (NousResearch#43936) The session-DB flush in `_flush_messages_to_session_db` was position-based (`messages[max(start_idx, _last_flushed_db_idx):]`). It assumes `messages == conversation_history + this turn's new messages`, which breaks two ways, both reproduced live: 1. Overlapping turns on the cached agent corrupt the shared `_last_flushed_db_idx` (it indexes the turn-local `messages` array) → the earlier completed turn flushes an empty slice and its delivered assistant row is never written. 2. `repair_message_sequence` compacts `messages` in place below `len(conversation_history)`, so `start_idx > len(messages)` permanently → self-reinforcing drop loop. The merged NousResearch#44837 fix (NousResearch#45260) clamps `_last_flushed_db_idx` to `len(messages)` but leaves `start_idx = len(conversation_history)` unbounded, so shape 2 is still an empty slice. No index arithmetic survives the two arrays diverging. Replace the positional slice with identity-based dedup: stamp each persisted dict with `_db_persisted`, recognize this turn's `conversation_history` entries by object identity (`id()`) and stamp instead of re-write. A message is written exactly once regardless of how the arrays shift; the NousResearch#860 and NousResearch#31507 guards hold by construction. `_db_persisted` is underscore-prefixed and already stripped from API payloads. The compression split clears the stamps so the surviving messages re-write into the new session row. `_last_flushed_db_idx` is kept updated for legacy readers but no longer decides what is written. Complementary to NousResearch#43962 (backfills delivered text after an interrupt in turn_finalizer.py); this prevents the drop at the flush layer and also covers the non-interrupt repair-shrink path. Regression: tests/run_agent/test_identity_flush.py (5, incl. an explicit test that the NousResearch#44837 clamp still drops where identity flush persists) + test_message_sequence_repair, test_860_dedup, test_compression_persistence all GREEN.

liuhao1024 · 2026-06-13T00:05:35Z

Verification: Code Review — Clean ✅

Reviewed the identity-based session-DB flush implementation. Key checks:

Identity vs positional flush: The switch from messages[max(start_idx, _last_flushed_db_idx):] to _db_persisted stamping eliminates the self-reinforcing drop loop where _last_flushed_db_idx exceeds len(messages) after compression. Correct fix for the root cause described in [Bug]: State.db drops assistant messages on interrupt; .jsonl session log no longer available as fallback #43936.
_db_persisted underscore prefix: The key starts with _, so the chat-completions wire sanitizer (agent/transports/chat_completions.py::convert_messages) strips it before API calls — strict gateways won't see it. Harmless in JSON session logs.
Compression path cleanup (conversation_compression.py): _cm.pop("_db_persisted", None) correctly clears stamps on compressed messages so they get re-persisted to the new session. Without this, the identity flush would skip them.
id() for object identity: _hist_ids = {id(_m) for _m in conversation_history} correctly identifies messages already in the DB transcript by Python object identity. Safe within the function's lifetime — both lists are live during the flush call.
Test update: test_flush_with_stale_history_loses_messages correctly flips from assert len(rows) == 0 (documenting the bug) to assert len(rows) == 2 (verifying the fix).

No dead variables, no guard-block leaks, no exception-ordering issues. Well-documented with inline comments explaining the design decisions.

This was referenced Jun 12, 2026

[Bug]: State.db drops assistant messages on interrupt; .jsonl session log no longer available as fallback #43936

Open

fix(agent): persist delivered assistant text on interrupted turns #43962

Open

Willhong force-pushed the fix/identity-flush-43936 branch from 7a1b036 to fc868f4 Compare June 13, 2026 00:01

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Jun 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): identity-based session-DB flush — stop positional assistant drops (#43936)#45277

fix(agent): identity-based session-DB flush — stop positional assistant drops (#43936)#45277
Willhong wants to merge 1 commit into
NousResearch:mainfrom
Willhong:fix/identity-flush-43936

Willhong commented Jun 12, 2026 •

edited

Loading

Uh oh!

liuhao1024 commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Willhong commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Risk / rollout

Uh oh!

liuhao1024 commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Willhong commented Jun 12, 2026 •

edited

Loading