fix(session): persist assistant after repair compacts message list by JerryLiu369 · Pull Request #44838 · NousResearch/hermes-agent

JerryLiu369 · 2026-06-12T10:32:55Z

Summary

Fixes #44837 — gateway sessions where the model sees multiple user messages merged with \n\n while platform inbound is single-message.

Rebased onto latest main. Only the delta beyond merged #44327 / #44518:

Clamp _last_flushed_db_idx after repair_message_sequence compacts the in-memory list.
Guard _flush_messages_to_session_db when the cursor overshoots len(messages).

#44518 fixes stale cursor across cached-agent turns; this PR addresses the same-turn early-persist → repair → turn-end flush gap.

Root cause

turn_context early-persists inbound user → flush cursor advances.
First API call merges consecutive users via repair_message_sequence → list shrinks.
Turn-end flush can skip the first assistant row → orphan user in state.db.
Next turn repair merges again → \n\n blob grows.

Related open PRs (same area, not duplicates)

fix(agent): persist repaired current turn history #24211 — replace_messages() when repair merges into an already-persisted user row
fix(agent): prevent silent data loss when message repair shortens history #24196 — flush overshoot guard when repair shortens history

This PR is intentionally minimal; happy to adjust if maintainers prefer a different merge order or layering.

Test plan

tests/run_agent/test_early_persist_repair_flush.py
tests/run_agent/test_860_dedup.py — no dedup regression

pytest tests/run_agent/test_early_persist_repair_flush.py \
       tests/run_agent/test_860_dedup.py -q -o addopts=

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds regression coverage and cursor-handling fixes to ensure SessionDB flushing remains correct when message lists are compacted (repair/compression) after an early persist, especially for cached-agent turns.

Changes:

Clamp/reset _last_flushed_db_idx to prevent missing assistant/tool flushes after message-sequence compaction.
Reset stale flush cursor at the start of a fresh cached-agent turn (interrupt depth 0).
Add focused regression tests for both the repair-compaction scenario and cached-agent turn initialization.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/run_agent/test_early_persist_repair_flush.py	New regression tests covering early persist + repair compaction and stale flush cursor clamping.
tests/gateway/test_agent_cache.py	New tests validating flush cursor reset/preservation based on `interrupt_depth`.
run_agent.py	Adjusts SessionDB flush logic when `_last_flushed_db_idx` is beyond current `messages` length.
gateway/run.py	Resets `_last_flushed_db_idx` on fresh cached-agent turns to avoid skipping persistence.
agent/agent_runtime_helpers.py	Clamps `_last_flushed_db_idx` when `repair_message_sequence` compacts `messages`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

JerryLiu369 · 2026-06-12T11:06:04Z

            flush_from = max(start_idx, self._last_flushed_db_idx)
+            # repair_message_sequence / compression may shrink `messages`
+            # after early persist advanced the cursor past the new length.
+            if self._last_flushed_db_idx > len(messages):
+                flush_from = start_idx


Good catch — agreed. Replaced the `flush_from = start_idx` fallback with clamping `_last_flushed_db_idx` to `len(messages)` before computing `flush_from`, so we avoid re-appending when `start_idx == 0`. Compression tests still pass.

JerryLiu369 · 2026-06-12T11:06:06Z

+        if hasattr(agent, "_last_flushed_db_idx"):
+            agent._last_flushed_db_idx = min(
+                agent._last_flushed_db_idx, len(messages)
+            )


Makes sense — clamp now runs only when the cursor is an `int` (74457d1 + follow-up commit).

Early turn-start user persist advances _last_flushed_db_idx before repair_message_sequence merges consecutive users and compacts the list. Turn-end flush then skipped the first assistant row, leaving orphan users that grew via \n\n merge on the next gateway turn. Clamp the flush cursor after repair and when idx overshoots list length; also reset _last_flushed_db_idx on cached-agent fresh turns (NousResearch#44327). Co-authored-by: Cursor <cursoragent@cursor.com>

JerryLiu369 · 2026-06-12T11:00:56Z

Rebased onto latest main — diff is now 3 files (~14 LOC production + regression test). Removed the duplicate #44327 hunk since that already landed in #44518.

Scope: same-turn gap after early user persist → repair_message_sequence shrinks the list → turn-end flush can skip the first assistant row. The test reproduces that shape directly.

Not claiming to supersede #24211 or #24196 — those go further on transcript rewrite / flush overshoot in related cases. This is meant as a minimal, test-backed patch for the assistant-drop path seen in production logs. Totally fine to combine, defer, or redirect if a broader approach is preferred.

Address Copilot review: resetting flush_from to start_idx when conversation_history is None could duplicate DB rows. Clamp _last_flushed_db_idx to len(messages) instead; guard isinstance in repair_message_sequence clamp. Co-authored-by: Cursor <cursoragent@cursor.com>

liuhao1024 · 2026-06-12T12:56:28Z

Code Review: Clean ✅

Reviewed the full diff across agent/agent_runtime_helpers.py, run_agent.py, and the new test file.

What was checked:

Dual clamp location: The clamp in repair_message_sequence (early, during in-place mutation) and the defensive clamp in _flush_messages_to_session_db (at flush time) are complementary, not redundant. The first handles the case where repair happens before a flush; the second handles any other code path that advances _last_flushed_db_idx past the new list length.
Guard semantics: isinstance(_last, int) guards against None (attribute unset), and min(_last, len(messages)) prevents index overshoot without resetting the cursor to 0 (which would re-append already-persisted rows).
Test coverage: Both the regression scenario (early persist → repair compaction → tool turn) and the direct cursor-clamp case are covered. The test creates a real SessionDB and verifies the actual persisted rows, not just the cursor value.

Architectural note: The _last_flushed_db_idx cursor pattern (advance-on-persist, read-at-flush) is a classic "stale pointer after compaction" bug. The fix is minimal and correctly scoped — no behavioral change to the happy path, only bounds correction when the list shrinks.

JerryLiu369 · 2026-06-13T03:52:49Z

Closing — #45260 (salvage of #44870) has already been merged with a broader fix for the same root cause. Thanks for the review @liuhao1024.

Copilot AI review requested due to automatic review settings June 12, 2026 10:32

Copilot AI reviewed Jun 12, 2026

View reviewed changes

JerryLiu369 mentioned this pull request Jun 12, 2026

Session DB turn-end flush drops assistant after repair_message_sequence compacts list (orphan user → \n\n merge) #44837

Closed

JerryLiu369 force-pushed the fix/session-db-flush-after-repair branch from 164104e to 74457d1 Compare June 12, 2026 10:38

alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround labels Jun 12, 2026

alt-glitch mentioned this pull request Jun 12, 2026

fix(agent): clamp flush cursor after repair_message_sequence compacts messages (#44837) #44870

Closed

alt-glitch mentioned this pull request Jun 12, 2026

fix: Session DB turn-end flush drops assistant after repair_message_sequence compacts list (orphan user → \n\n merge) (#44837) #45076

Closed

1 task

JerryLiu369 closed this Jun 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(session): persist assistant after repair compacts message list#44838

fix(session): persist assistant after repair compacts message list#44838
JerryLiu369 wants to merge 2 commits into
NousResearch:mainfrom
JerryLiu369:fix/session-db-flush-after-repair

JerryLiu369 commented Jun 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

JerryLiu369 Jun 12, 2026

Uh oh!

JerryLiu369 Jun 12, 2026

Uh oh!

JerryLiu369 commented Jun 12, 2026

Uh oh!

liuhao1024 commented Jun 12, 2026

Uh oh!

JerryLiu369 commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

JerryLiu369 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Related open PRs (same area, not duplicates)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

JerryLiu369 Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

JerryLiu369 Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

JerryLiu369 commented Jun 12, 2026

Uh oh!

liuhao1024 commented Jun 12, 2026

Uh oh!

JerryLiu369 commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JerryLiu369 commented Jun 12, 2026 •

edited

Loading