Summary
AIAgent._repair_message_sequence(messages) can shorten the in-memory messages list before persistence, but _flush_messages_to_session_db(messages, conversation_history) still uses len(conversation_history) as the skip offset.
If repair removes or merges enough messages from the historical portion, flush_from can become greater than len(messages). Python slicing then returns an empty list, so the current user turn and assistant response are silently not persisted to SessionDB.
Impact
Gateway-style integrations that create a fresh AIAgent per inbound message rely on SessionDB for continuity. Once this happens, the same session keeps loading stale history, causing follow-up messages like "yes", "check again", or "continue" to resolve against old context.
Observed symptom:
- user asks about weather
- assistant asks whether to check again
- user replies "check"
- model answers an unrelated old topic because the recent weather turn was never persisted
Root Cause
The relevant flow is:
-
Gateway loads persisted history from SessionDB.
-
run_conversation() builds:
messages = list(conversation_history)
messages.append(current_user_message)
-
Before the API call, Hermes runs:
repaired_seq = self._repair_message_sequence(messages)
-
That repair can mutate messages in place by:
- dropping stray/orphan tool messages
- merging consecutive user messages
-
On exit, persistence still computes:
start_idx = len(conversation_history)
flush_from = max(start_idx, self._last_flushed_db_idx)
for msg in messages[flush_from:]:
self._session_db.append_message(...)
If conversation_history had 120 entries, but repair shortens messages to 116, then:
No exception is raised, and the current turn is skipped.
Minimal Reproduction Shape
A simplified reproduction:
history_len=120
messages before repair = 122
repair removes/merges 6 historical entries
messages after repair = 116
flush_from = len(conversation_history) = 120
messages[120:] = []
flushed_rows = 0
Observed local reproduction output:
history_len=120 before_repair=122 repairs=6 after_repair=116 flushed_rows=0
Expected Behavior
The current user message and assistant response should always be persisted, even if historical messages are repaired before the model call.
At minimum, _flush_messages_to_session_db() should not silently skip persistence when:
flush_from > len(messages)
Suggested Fix
Do not use the original len(conversation_history) as the only persistence boundary after in-place repair.
Possible approaches:
- Track the current-turn boundary explicitly after repair.
- Adjust the persistence offset when
_repair_message_sequence() mutates messages.
- Persist the current turn separately from historical replay.
- Add a warning/error if
flush_from > len(messages).
Suggested Regression Test
Add a test where:
conversation_history contains malformed historical entries.
messages = conversation_history + [current_user, assistant_reply].
_repair_message_sequence(messages) shortens messages.
_flush_messages_to_session_db(messages, conversation_history) is called.
- Assert the current user and assistant reply are still written to SessionDB.
This should prevent silent context loss in gateway integrations.
Summary
AIAgent._repair_message_sequence(messages)can shorten the in-memorymessageslist before persistence, but_flush_messages_to_session_db(messages, conversation_history)still useslen(conversation_history)as the skip offset.If repair removes or merges enough messages from the historical portion,
flush_fromcan become greater thanlen(messages). Python slicing then returns an empty list, so the current user turn and assistant response are silently not persisted to SessionDB.Impact
Gateway-style integrations that create a fresh
AIAgentper inbound message rely on SessionDB for continuity. Once this happens, the same session keeps loading stale history, causing follow-up messages like "yes", "check again", or "continue" to resolve against old context.Observed symptom:
Root Cause
The relevant flow is:
Gateway loads persisted history from SessionDB.
run_conversation()builds:Before the API call, Hermes runs:
That repair can mutate
messagesin place by:On exit, persistence still computes:
If
conversation_historyhad 120 entries, but repair shortensmessagesto 116, then:No exception is raised, and the current turn is skipped.
Minimal Reproduction Shape
A simplified reproduction:
Observed local reproduction output:
Expected Behavior
The current user message and assistant response should always be persisted, even if historical messages are repaired before the model call.
At minimum,
_flush_messages_to_session_db()should not silently skip persistence when:Suggested Fix
Do not use the original
len(conversation_history)as the only persistence boundary after in-place repair.Possible approaches:
_repair_message_sequence()mutatesmessages.flush_from > len(messages).Suggested Regression Test
Add a test where:
conversation_historycontains malformed historical entries.messages = conversation_history + [current_user, assistant_reply]._repair_message_sequence(messages)shortensmessages._flush_messages_to_session_db(messages, conversation_history)is called.This should prevent silent context loss in gateway integrations.