Bug Description
A gateway turn can visibly send an assistant response to a messaging platform while failing to persist the assistant row into state.db. On the next turn, session replay contains a user-only backlog and/or stale unfinished tool context, so the agent treats already-resolved messages as still pending and responds to old turns instead of the latest user message.
This is especially damaging on Telegram/Discord because the user sees a reply, but the model's next replay does not. From the agent's perspective, the turn looks interrupted or unfinished even though the assistant response was delivered.
Expected Behavior
If the gateway sends a visible assistant response, the durable session transcript must contain an assistant message row for that response before the next turn replays the session.
Suggested invariant:
Delivered assistant response ⇒ assistant row exists in the session transcript for that gateway turn.
Actual Behavior
After a recent update, I observed a live Telegram gateway session where replies had been delivered, but the active SQLite transcript contained a long run of user messages with missing assistant rows between them. The next turn replayed stale/resolved user messages as if they had never been answered, causing the agent to loop on prior requests and ignore the user's latest correction.
Likely Regression Shape
This appears related to the interaction between:
- Early user-message persistence at gateway turn start, which is good for not losing inbound messages.
- Duplicate-user prevention that treats
self._session_db is not None / agent_persisted=True as equivalent to "the full turn, including assistant response, has been persisted."
That assumption is too broad. A user row can be present while the assistant row is missing.
Local Hotfix That Resolved It
A local patch added a post-send/backfill check in the gateway turn path:
- Record the session DB tail before the turn.
- After the assistant response is produced/sent and normal persistence has had a chance to run, check whether an assistant row exists after that pre-turn tail.
- If no assistant row exists, append the delivered response as an assistant message.
After restart, logs showed the backfill path firing, and subsequent turns stopped replaying the stale user-only backlog.
Steps to Reproduce / Regression Test Shape
A focused regression test should simulate:
- Gateway receives a Telegram/Discord message.
- Early user persistence writes the user row.
- Agent/gateway returns a non-empty assistant response that would be delivered to the platform.
- Agent-side transcript flush fails/skips/does not append an assistant row.
- Gateway fallback currently believes persistence is handled because
SessionDB exists or agent_persisted=True.
- Assert that the final DB transcript still contains an assistant row for the delivered response.
Without the fix, the transcript has the user row but no assistant row. With the fix, the delivered response is backfilled.
Impact
High for messaging-platform reliability:
- Stale user messages appear unresolved.
- Agent re-answers already-resolved turns.
- Interrupted-turn/tool-result notes get amplified because the transcript looks unfinished.
- Users see the assistant looping or ignoring corrections even though prior replies were visibly sent.
Environment
- Hermes gateway on macOS via LaunchAgent
- Telegram gateway session
- Observed on current
origin/main around 4cecb1a13
- Local patch and focused gateway tests validated the persistence invariant
Notes
The interrupted-turn system note wording can make the symptom worse, but it is not the root cause. The root cause is the broken transcript invariant: delivered assistant responses must be durably represented in session history.
Bug Description
A gateway turn can visibly send an assistant response to a messaging platform while failing to persist the assistant row into
state.db. On the next turn, session replay contains a user-only backlog and/or stale unfinished tool context, so the agent treats already-resolved messages as still pending and responds to old turns instead of the latest user message.This is especially damaging on Telegram/Discord because the user sees a reply, but the model's next replay does not. From the agent's perspective, the turn looks interrupted or unfinished even though the assistant response was delivered.
Expected Behavior
If the gateway sends a visible assistant response, the durable session transcript must contain an assistant message row for that response before the next turn replays the session.
Suggested invariant:
Actual Behavior
After a recent update, I observed a live Telegram gateway session where replies had been delivered, but the active SQLite transcript contained a long run of user messages with missing assistant rows between them. The next turn replayed stale/resolved user messages as if they had never been answered, causing the agent to loop on prior requests and ignore the user's latest correction.
Likely Regression Shape
This appears related to the interaction between:
self._session_db is not None/agent_persisted=Trueas equivalent to "the full turn, including assistant response, has been persisted."That assumption is too broad. A user row can be present while the assistant row is missing.
Local Hotfix That Resolved It
A local patch added a post-send/backfill check in the gateway turn path:
After restart, logs showed the backfill path firing, and subsequent turns stopped replaying the stale user-only backlog.
Steps to Reproduce / Regression Test Shape
A focused regression test should simulate:
SessionDBexists oragent_persisted=True.Without the fix, the transcript has the user row but no assistant row. With the fix, the delivered response is backfilled.
Impact
High for messaging-platform reliability:
Environment
origin/mainaround4cecb1a13Notes
The interrupted-turn system note wording can make the symptom worse, but it is not the root cause. The root cause is the broken transcript invariant: delivered assistant responses must be durably represented in session history.