Skip to content

fix(agent): clamp and rewind session flush cursor after repair_message_sequence compaction (#44837)#45260

Merged
teknium1 merged 2 commits into
mainfrom
fix/44837-flush-cursor-clamp
Jun 12, 2026
Merged

fix(agent): clamp and rewind session flush cursor after repair_message_sequence compaction (#44837)#45260
teknium1 merged 2 commits into
mainfrom
fix/44837-flush-cursor-clamp

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Salvages PR #44870 by @kyssta-exe (fixes #44837) onto current main and widens the fix: the flush cursor is now recomputed exactly after repair_message_sequence() compacts the message list, not just clamped to the new length.

A min() clamp only catches cursor overshoot past the new end. When repair drops/merges messages at indexes below the cursor, the clamp leaves it pointing past unflushed rows and the turn-end flush silently skips the assistant/tool chain — the malformed-transcript loop reported in #44837 ("agent dementia" looping).

Changes

Validation

Before After
user→user merge w/ flushed cursor=2, len→1 flush skips assistant chain cursor→1, chain persisted
merge below mid-turn cursor clamp leaves cursor past unflushed assistant cursor rewound by survivor count
targeted tests (repair + dedup + compression persistence) 30/30 pass
E2E (real imports, fake DB, full repair→flush path) assistant/tool/assistant rows persisted

Contributor commit cherry-picked with authorship preserved. Merge with rebase.

Infographic

session-flush-cursor-fix

kyssta-exe and others added 2 commits June 12, 2026 15:43
…he cursor

Follow-up to the #44837 clamp: a min() clamp only fixes cursor overshoot
past the new end of the list. When repair_message_sequence drops/merges
messages at indexes below the cursor, the clamp leaves the cursor pointing
past unflushed rows and the turn-end flush silently skips them.

Extract repair_message_sequence_with_cursor(): snapshot the flushed prefix
by object identity before repair, then recompute the cursor as the count
of surviving flushed messages. Falls back to the clamp when no snapshot is
available. Keeps the safety guard in _flush_messages_to_session_db.

Adds targeted tests for overshoot, before-cursor compaction, no-repair,
bare-agent, and the flush guard.
@github-actions

Copy link
Copy Markdown
Contributor

🔎 Lint report: fix/44837-flush-cursor-clamp vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 10881 on HEAD, 10879 on base (🆕 +2)

🆕 New issues (2):

Rule Count
unresolved-attribute 2
First entries
run_agent.py:2891: [unresolved-attribute] unresolved-attribute: Object of type `Self@get_credits_spent_micros` has no attribute `_credits_session_start_micros`
tests/run_agent/test_credits_notices_toggle.py:76: [unresolved-attribute] unresolved-attribute: Unresolved attribute `_credits_session_start_micros` on type `AIAgent`

✅ Fixed issues (1):

Rule Count
invalid-assignment 1
First entries
tests/run_agent/test_credits_notices_toggle.py:76: [invalid-assignment] invalid-assignment: Object of type `None` is not assignable to attribute `_credits_session_start_micros` of type `int`

Unchanged: 5703 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@liuhao1024

Copy link
Copy Markdown
Contributor

Verified — the cursor repair logic is correct and well-tested.

Checked:

  1. repair_message_sequence_with_cursor — snapshots id() of flushed messages before repair, then counts survivors after. This correctly handles the case where messages below the cursor are merged/dropped (not just a simple min() clamp). The id()-based tracking works because repair_message_sequence mutates in place without creating new dicts.
  2. Flush guard in _flush_messages_to_session_db — the min(self._last_flushed_db_idx, len(messages)) safety net catches any cursor overshoot that the helper might miss. Two layers of defense.
  3. Test coverage — 5 scenarios cover: compaction below cursor (the non-trivial min() can't catch case), compaction shrinking past cursor, no repairs, missing cursor attribute, and the flush guard safety net. All pass.
  4. No dead variable / missing importrepair_message_sequence_with_cursor is imported in conversation_loop.py and used correctly.

CI: all checks passing. LGTM.

@teknium1 teknium1 merged commit 8905ee6 into main Jun 12, 2026
28 checks passed
@teknium1 teknium1 deleted the fix/44837-flush-cursor-clamp branch June 12, 2026 23:29
Willhong pushed a commit to Willhong/hermes-agent that referenced this pull request Jun 13, 2026
…nt drops (NousResearch#43936)

The session-DB flush in `_flush_messages_to_session_db` was position-based
(`messages[max(start_idx, _last_flushed_db_idx):]`). It assumes
`messages == conversation_history + this turn's new messages`, which breaks two
ways, both reproduced live:

1. Overlapping turns on the cached agent corrupt the shared
   `_last_flushed_db_idx` (it indexes the turn-local `messages` array) → the
   earlier completed turn flushes an empty slice and its delivered assistant
   row is never written.
2. `repair_message_sequence` compacts `messages` in place below
   `len(conversation_history)`, so `start_idx > len(messages)` permanently →
   self-reinforcing drop loop.

The merged NousResearch#44837 fix (NousResearch#45260) clamps `_last_flushed_db_idx` to `len(messages)`
but leaves `start_idx = len(conversation_history)` unbounded, so shape 2 is
still an empty slice. No index arithmetic survives the two arrays diverging.

Replace the positional slice with identity-based dedup: stamp each persisted
dict with `_db_persisted`, recognize this turn's `conversation_history` entries
by object identity (`id()`) and stamp instead of re-write. A message is written
exactly once regardless of how the arrays shift; the NousResearch#860 and NousResearch#31507 guards
hold by construction. `_db_persisted` is underscore-prefixed and already
stripped from API payloads. The compression split clears the stamps so the
surviving messages re-write into the new session row. `_last_flushed_db_idx` is
kept updated for legacy readers but no longer decides what is written.

Complementary to NousResearch#43962 (backfills delivered text after an interrupt in
turn_finalizer.py); this prevents the drop at the flush layer and also covers
the non-interrupt repair-shrink path.

Regression: tests/run_agent/test_identity_flush.py (5, incl. an explicit
test that the NousResearch#44837 clamp still drops where identity flush persists) +
test_message_sequence_repair, test_860_dedup, test_compression_persistence all
GREEN.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Session DB turn-end flush drops assistant after repair_message_sequence compacts list (orphan user → \n\n merge)

3 participants