Skip to content

fix(gateway): reset _last_flushed_db_idx on cached-agent reuse#44354

Closed
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/cached-agent-flush-cursor
Closed

fix(gateway): reset _last_flushed_db_idx on cached-agent reuse#44354
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/cached-agent-flush-cursor

Conversation

@liuhao1024

Copy link
Copy Markdown
Contributor

What does this PR do?

Resets _last_flushed_db_idx on cached AIAgent reuse so _flush_messages_to_session_db recomputes its flush offset from the new turn's conversation_history length instead of carrying a stale cursor from the previous turn. Without this, the assistant reply is silently skipped and the transcript accumulates consecutive user rows, corrupting context replay on subsequent turns.

Related Issue

Fixes #44327

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • gateway/run.py: Added agent._last_flushed_db_idx = 0 to _init_cached_agent_for_turn() so the SessionDB flush cursor is reset on every cached-agent turn, regardless of interrupt depth.
  • tests/gateway/test_agent_cache.py: Added 3 tests verifying the flush cursor reset (fresh turn, interrupt-recursive turn, idempotent zero case).

How to Test

  1. Start a gateway session and send multiple messages to build conversation history.
  2. Verify that each assistant reply appears in the SQLite messages table (sqlite3 ~/.hermes/state.db "SELECT role, content FROM messages WHERE session_id='<id>'").
  3. Before this fix, a cached agent could skip persisting the assistant row, producing consecutive user rows that corrupt context replay.
  4. Run pytest tests/gateway/test_agent_cache.py -v -k flush_cursor — all 3 new tests should pass.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Code Intelligence

When the gateway reuses a cached AIAgent across turns,
_init_cached_agent_for_turn() reset _api_call_count but left
_last_flushed_db_idx stale from the previous turn.  This caused
_flush_messages_to_session_db() to compute flush_from from the stale
cursor rather than the new turn's conversation_history length, silently
skipping the assistant reply and producing consecutive-user transcript
rows that corrupt context replay on subsequent turns.

Reset _last_flushed_db_idx to 0 at the start of every cached-agent
turn (regardless of interrupt depth) so the flush cursor is always
aligned with the current turn's history boundary.

Fixes NousResearch#44327
@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists duplicate This issue or pull request already exists labels Jun 11, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #32760 — identical fix resetting _last_flushed_db_idx to 0 on cached-agent reuse in _init_cached_agent_for_turn. #32760 (open) is the earlier PR for this exact one-line fix.

@liuhao1024

Copy link
Copy Markdown
Contributor Author

Thanks for the flag @alt-glitch. Confirmed — #32760 has the same one-line fix (agent._last_flushed_db_idx = 0), opened earlier on May 26. Closing in favor of #32760.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery duplicate This issue or pull request already exists P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gateway cached-agent reuse can leak _last_flushed_db_idx across turns and skip assistant transcript rows

2 participants