fix(gateway): realign _last_flushed_db_idx on cached-agent reuse to prevent skipped transcript rows#44425
Conversation
…revent skipped transcript rows When the gateway reuses a cached AIAgent, _last_flushed_db_idx from the previous turn can be larger than the current turn's agent_history length. This causes _flush_messages_to_session_db() to compute a flush_from offset that skips the current turn's assistant reply, leaving a transcript with consecutive user rows and no assistant response. Fix: realign agent._last_flushed_db_idx = len(agent_history) immediately after _build_gateway_agent_history() returns, before the agent processes the new turn. Fixes NousResearch#44327
|
Thanks for the flag @alt-glitch. Comparing the two PRs:
This PR includes regression tests that verify the fix prevents the stale index from causing message skips on agent reuse. Keeping open. |
|
Thanks for flagging @alt-glitch. Comparing the two PRs:
This PR includes regression tests verifying the behavior. Keeping open for the more complete fix. |
|
Thanks for this @liuhao1024 — your diagnosis of the bug was spot on: a cached agent carrying a stale We landed #44518 (@kyssta-exe) for this, in a942bfd. It fixes the same bug but resets the cursor in Your fix targets the right root cause; the difference is placement + interrupt-depth awareness. Closing as resolved by #44518 — thanks for the careful write-up, the scenario in your test description matched the real failure exactly. |
What does this PR do?
Realigns
_last_flushed_db_idxto the currentagent_historylength when the gateway reuses a cachedAIAgent, preventing the DB-flush cursor from a previous turn from skipping the current turn's assistant reply in the session transcript.Related Issue
Fixes #44327
Type of Change
Changes Made
gateway/run.py: After_build_gateway_agent_history()returns, setagent._last_flushed_db_idx = len(agent_history)so a stale cursor from the previous turn does not cause_flush_messages_to_session_db()to skip the new turn's assistant row.tests/gateway/test_agent_cache.py: AddedTestCachedAgentFlushCursorRealignwith 3 tests:test_stale_flush_cursor_realigns_to_agent_history: verifies the cursor is set tolen(agent_history)test_flush_after_realign_persists_new_turn_messages: end-to-end test showing the fix prevents message skippingtest_stale_cursor_without_realign_skips_messages: demonstrates the bug (without the fix, messages are silently dropped)How to Test
pytest tests/gateway/test_agent_cache.py::TestCachedAgentFlushCursorRealign -xvs— all 3 tests should passpytest tests/gateway/test_agent_cache.py -x— all 66 tests should pass (no regressions)pytest tests/run_agent/test_compression_persistence.py tests/run_agent/test_860_dedup.py -x— existing persistence tests still passChecklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/ACode Intelligence
gateway/run.py::_process_message_background(caller of_build_gateway_agent_historyand_flush_messages_to_session_db)run_agent.py::_flush_messages_to_session_db(uses_last_flushed_db_idxto compute flush offset)cli_commands_mixin.py:712-713andcli.py:5921-5922also realign_last_flushed_db_idxon session reset; this PR applies the same pattern to the gateway cached-agent path