Gateway cached-agent reuse can leak _last_flushed_db_idx across turns and skip assistant transcript rows

## Summary

When the gateway reuses a cached `AIAgent`, the per-agent SessionDB flush cursor can leak across turns.

`GatewayRunner._init_cached_agent_for_turn()` resets some per-turn state, but it does **not** realign `agent._last_flushed_db_idx` to the history actually passed into the new turn.

As a result, a reused agent can compute a `flush_from` offset that is too large for the current turn and silently skip persisting the assistant reply into `state.db`.

This leaves a transcript with multiple consecutive `user` rows and missing `assistant` rows, which then causes later turns to replay stale questions and produce "repeated" or blended answers.

This looks related to, but distinct from:

- #24187, where message repair shortens `messages` vs `conversation_history`
- #43849, which describes the user-visible symptom that a reply was delivered but no assistant row was persisted

## Observed Behavior

In a live gateway session, the user reported that each new reply kept dragging prior questions into the current answer.

Inspecting the session transcript showed a pattern like:

- assistant
- user
- user
- user
- user
- user
- user
- assistant

So the platform visibly delivered multiple assistant replies over time, but the durable SQLite transcript only retained the final one. On subsequent turns, Hermes loaded this broken history, triggered consecutive-user repair, and effectively merged several old user turns into the next prompt.

## Root Cause Hypothesis

The critical pieces are:

1. `gateway/run.py` reuses cached agents:

   ```python
   if cached and cached[1] == _sig:
       agent = cached[0]
       self._init_cached_agent_for_turn(agent, _interrupt_depth)
   ```

2. `_init_cached_agent_for_turn()` currently resets only:

   - `_last_activity_ts`
   - `_last_activity_desc`
   - `_api_call_count`

   It does **not** reset or realign `_last_flushed_db_idx`.

3. `run_agent.py::_flush_messages_to_session_db()` later computes:

   ```python
   start_idx = len(conversation_history) if conversation_history else 0
   flush_from = max(start_idx, self._last_flushed_db_idx)
   for msg in messages[flush_from:]:
       ... append_message(...)
   ```

If the cached agent still carries `_last_flushed_db_idx` from the previous turn, the new turn can start flushing from a later index than the current `conversation_history` boundary. Then the assistant message for this turn is silently skipped.

## Why This Causes Repeated Answers

On the gateway success path, transcript writes assume the agent already persisted the DB rows:

```python
agent_persisted = self._session_db is not None
append_to_transcript(..., skip_db=agent_persisted)
```

So if the agent-side flush skips the assistant row, the gateway does not backfill it. The next inbound message then reloads a transcript containing several consecutive `user` rows with the assistant rows missing.

That broken replay state matches the repeated-answer symptom exactly:

- Hermes repairs/merges consecutive `user` messages
- old unanswered-looking questions get folded into the next prompt
- the new reply appears to repeat or drag in previous topics

## Minimal Regression Shape

A focused test should simulate:

1. Create a cached `AIAgent` for a gateway session
2. Run one turn so `_last_flushed_db_idx` becomes non-zero
3. Reuse the same cached agent for a second turn with freshly loaded `history`
4. Do **not** reset `_last_flushed_db_idx`
5. Persist the second turn
6. Assert that the second turn's assistant row is missing from SessionDB

Then apply the fix and assert the assistant row is present.

## Suggested Fix

When reusing a cached agent, realign the flush cursor to the history actually being replayed for this turn.

Two plausible fixes:

1. In the gateway path, after `agent_history` is built for the current turn, set:

   ```python
   agent._last_flushed_db_idx = len(agent_history)
   ```

2. Or more defensively, inside persistence, clamp / recompute `flush_from` so stale cached-agent state cannot skip the current turn.

The first option seems the most direct because `_last_flushed_db_idx` is turn-local persistence state, and cached-agent reuse is precisely where the stale value crosses turn boundaries.

## Expected Invariant

For every successful gateway turn:

> If a visible assistant response is produced, the session transcript for that turn must contain the corresponding assistant row.

## Environment

- Hermes gateway on macOS
- Profile-scoped gateway session
- Cached-agent reuse enabled in gateway
- SessionDB (`state.db`) is the canonical transcript store


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gateway cached-agent reuse can leak _last_flushed_db_idx across turns and skip assistant transcript rows #44327

Summary

Observed Behavior

Root Cause Hypothesis

Why This Causes Repeated Answers

Minimal Regression Shape

Suggested Fix

Expected Invariant

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Gateway cached-agent reuse can leak _last_flushed_db_idx across turns and skip assistant transcript rows #44327

Description

Summary

Observed Behavior

Root Cause Hypothesis

Why This Causes Repeated Answers

Minimal Regression Shape

Suggested Fix

Expected Invariant

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions