Skip to content

fix(agent): identity-based session-DB flush — stop positional assistant drops (#43936)#45277

Open
Willhong wants to merge 1 commit into
NousResearch:mainfrom
Willhong:fix/identity-flush-43936
Open

fix(agent): identity-based session-DB flush — stop positional assistant drops (#43936)#45277
Willhong wants to merge 1 commit into
NousResearch:mainfrom
Willhong:fix/identity-flush-43936

Conversation

@Willhong

@Willhong Willhong commented Jun 12, 2026

Copy link
Copy Markdown

What does this PR do?

Fixes assistant messages that are delivered to the chat but silently dropped
from state.db, so the next turn replays a user-only backlog and loses context
(#43936).

The root cause is that _flush_messages_to_session_db (in run_agent.py) was
position-based:

flush_from = max(start_idx, self._last_flushed_db_idx)   # start_idx = len(conversation_history)
for msg in messages[flush_from:]:
    self._session_db.append_message(...)

This assumes messages == conversation_history + this turn's new messages, so
slicing at len(conversation_history) isolates the new tail. That assumption
breaks in two real, independently-reproduced ways:

  1. Overlapping turns corrupt the shared cursor. _last_flushed_db_idx lives
    on the cached agent instance, but it indexes the turn-local messages
    array that every turn rebuilds. When a second inbound message interrupts an
    in-flight turn on the same cached agent (the exact repro in [Bug]: State.db drops assistant messages on interrupt; .jsonl session log no longer available as fallback #43936 — "send
    another message before the agent finishes"), the later turn advances
    _last_flushed_db_idx past the earlier turn's len(messages). The earlier,
    now-completed turn's final flush then computes flush_from > len(messages),
    the slice is empty, and a fully-generated, already-delivered assistant
    response is never written.

  2. conversation_history ends up longer than messages. Once a drop has
    poisoned the transcript (consecutive-user / orphan-tool rows), the next
    turn's repair_message_sequence merges/drops those rows and shrinks
    messages below len(conversation_history). Now start_idx > len(messages)
    permanently, every later flush slices past the end, and the session
    enters a self-reinforcing drop loop. No interrupt is needed to keep it going.

No index arithmetic survives the two arrays diverging, so this replaces the
positional slice with identity-based dedup.

Re: the already-merged #44837 fix (#45260). #45260 changed the cursor to
flush_from = max(start_idx, min(self._last_flushed_db_idx, len(messages))).
That clamps _last_flushed_db_idx to len(messages) but leaves
start_idx = len(conversation_history) unbounded. For the live shape in
trigger #2 (start_idx=153, len(messages)=145), flush_from is still
max(153, ≤145) = 153 ≥ len(messages) → the slice is still empty and the
assistant is still dropped. So #45260 reduces the symptom but does not close
trigger #2 (nor trigger #1). This PR rebases on top of #45260 and replaces the
positional clamp entirely; tests/run_agent/test_identity_flush.py:: test_supersedes_44837_clamp_residual_drop pins the residual drop and proves
the identity flush persists where the clamp doesn't.

The fix is the minimal flush-layer change, not the issue's suggested ".jsonl
fallback restore" — restoring a second log would paper over the drop instead of
preventing it, and the same repair-shrink path (trigger #2) would still desync
both stores.

Related Issue

Fixes #43936

Related to #44837 (the repair-compaction drop), whose merged fix #45260 reduced
but did not eliminate the drop — see the note above; this PR rebases on #45260
and supersedes its positional clamp.

Complementary to (not overlapping with) #43962, which backfills delivered text
after an interrupt in turn_finalizer.py. This PR prevents the drop at the
flush layer
, which also covers the non-interrupt trigger #2 that a finalizer
backfill cannot reach. The two can land independently.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • run_agent.py_flush_messages_to_session_db now stamps each persisted
    dict with _db_persisted after its DB write, and recognizes this turn's
    conversation_history entries by object identity (id()), stamping them
    instead of re-writing. A message is written exactly once regardless of how the
    arrays shift. _db_persisted is underscore-prefixed and already stripped from
    API payloads (chat_completion_helpers.py, anthropic_adapter.py; the Codex
    Responses adapter builds fresh items), so it never reaches a provider.
    _last_flushed_db_idx is kept updated for legacy readers (cli.py
    resume/branch) but no longer decides what gets written.
  • agent/conversation_compression.py — the compression session split clears the
    _db_persisted stamps from the surviving messages so they re-write into the
    new session row.
  • tests/run_agent/test_identity_flush.py — new regression suite (4 cases).
  • tests/run_agent/test_compression_persistence.py
    test_flush_with_stale_history_loses_messages previously asserted the bug
    condition (0 rows persisted); it now asserts the fix (both messages persist
    despite a stale, longer conversation_history).

How to Test

Reproduction (matches #43936):

  1. Run the gateway in any chat platform with a single agent.
  2. Send a message; let the agent begin responding.
  3. Before it finishes, send a second message (overlapping turn).
  4. Before: the first response appears in chat but has no assistant row in
    state.db (gap; next turn replays a user-only backlog).
    After: the assistant row is persisted; no gap.

Automated:

scripts/run_tests.sh tests/run_agent/

tests/run_agent/test_identity_flush.py encodes both triggers:

Verified live on the gateway: a 99-second, 6-API-call multi-tool turn with two
overlapping inbound messages (the exact shape that dropped a 1096-char response
before this change) persisted every assistant/tool row, and the
FLUSH-IDENTITY log line fired on the len_conv > len_messages divergence with
assistant_written=True.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(agent): …)
  • I searched for existing PRs (found fix(agent): persist delivered assistant text on interrupted turns #43962; described the relationship above)
  • My PR contains only changes related to this fix
  • I've run scripts/run_tests.sh tests/run_agent/ and all tests pass
  • I've added tests for my changes
  • I've tested on my platform: macOS 26.5 (Apple Silicon), Python 3.11

Documentation & Housekeeping

  • Documentation update — N/A (internal persistence path, no user-facing config)
  • cli-config.yaml.example — N/A (no config keys added)
  • CONTRIBUTING.md / AGENTS.md — N/A (no architecture/workflow change)
  • Cross-platform impact considered — the change is pure in-memory dict
    bookkeeping + the existing SQLite path; no OS-specific behavior
  • Tool descriptions/schemas — N/A

Risk / rollout

Behavior-neutral on normal sequential turns: a fresh message dict is never in
_hist_ids and carries no stamp, so it is written exactly as before. The only
behavior change is on the divergence paths, which previously dropped data.

…nt drops (NousResearch#43936)

The session-DB flush in `_flush_messages_to_session_db` was position-based
(`messages[max(start_idx, _last_flushed_db_idx):]`). It assumes
`messages == conversation_history + this turn's new messages`, which breaks two
ways, both reproduced live:

1. Overlapping turns on the cached agent corrupt the shared
   `_last_flushed_db_idx` (it indexes the turn-local `messages` array) → the
   earlier completed turn flushes an empty slice and its delivered assistant
   row is never written.
2. `repair_message_sequence` compacts `messages` in place below
   `len(conversation_history)`, so `start_idx > len(messages)` permanently →
   self-reinforcing drop loop.

The merged NousResearch#44837 fix (NousResearch#45260) clamps `_last_flushed_db_idx` to `len(messages)`
but leaves `start_idx = len(conversation_history)` unbounded, so shape 2 is
still an empty slice. No index arithmetic survives the two arrays diverging.

Replace the positional slice with identity-based dedup: stamp each persisted
dict with `_db_persisted`, recognize this turn's `conversation_history` entries
by object identity (`id()`) and stamp instead of re-write. A message is written
exactly once regardless of how the arrays shift; the NousResearch#860 and NousResearch#31507 guards
hold by construction. `_db_persisted` is underscore-prefixed and already
stripped from API payloads. The compression split clears the stamps so the
surviving messages re-write into the new session row. `_last_flushed_db_idx` is
kept updated for legacy readers but no longer decides what is written.

Complementary to NousResearch#43962 (backfills delivered text after an interrupt in
turn_finalizer.py); this prevents the drop at the flush layer and also covers
the non-interrupt repair-shrink path.

Regression: tests/run_agent/test_identity_flush.py (5, incl. an explicit
test that the NousResearch#44837 clamp still drops where identity flush persists) +
test_message_sequence_repair, test_860_dedup, test_compression_persistence all
GREEN.
@Willhong Willhong force-pushed the fix/identity-flush-43936 branch from 7a1b036 to fc868f4 Compare June 13, 2026 00:01
@liuhao1024

Copy link
Copy Markdown
Contributor

Verification: Code Review — Clean ✅

Reviewed the identity-based session-DB flush implementation. Key checks:

  1. Identity vs positional flush: The switch from messages[max(start_idx, _last_flushed_db_idx):] to _db_persisted stamping eliminates the self-reinforcing drop loop where _last_flushed_db_idx exceeds len(messages) after compression. Correct fix for the root cause described in [Bug]: State.db drops assistant messages on interrupt; .jsonl session log no longer available as fallback #43936.

  2. _db_persisted underscore prefix: The key starts with _, so the chat-completions wire sanitizer (agent/transports/chat_completions.py::convert_messages) strips it before API calls — strict gateways won't see it. Harmless in JSON session logs.

  3. Compression path cleanup (conversation_compression.py): _cm.pop("_db_persisted", None) correctly clears stamps on compressed messages so they get re-persisted to the new session. Without this, the identity flush would skip them.

  4. id() for object identity: _hist_ids = {id(_m) for _m in conversation_history} correctly identifies messages already in the DB transcript by Python object identity. Safe within the function's lifetime — both lists are live during the flush call.

  5. Test update: test_flush_with_stale_history_loses_messages correctly flips from assert len(rows) == 0 (documenting the bug) to assert len(rows) == 2 (verifying the fix).

No dead variables, no guard-block leaks, no exception-ordering issues. Well-documented with inline comments explaining the design decisions.

@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: State.db drops assistant messages on interrupt; .jsonl session log no longer available as fallback

3 participants