Bug description
Gateway idle session expiry can mark a session as finalized even though an OpenViking-backed memory session was never committed. This leaves OpenViking with synced turn messages but no session commit/extraction, so memories are never indexed and the gateway will not retry because the local session is already marked finalized.
This looks like a follow-up edge case to #14981 rather than the same bug. #14981 fixed firing on_session_finalize on idle expiry, but this case concerns OpenViking MemoryProvider.on_session_end() / session commit when the cached AIAgent or provider instance is unavailable by the time the expiry watcher runs.
Related but distinct:
Observed behavior
After gateway idle expiry, some OpenViking sessions can be left in this state:
message_count > 0
pending_tokens > 0
commit_count == 0
memories_extracted.total == 0
At the same time, the gateway logs/session store indicate the idle expiry sweep completed/finalized the session. Because the local session is finalized, the watcher does not retry and OpenViking extraction never happens.
In a real local audit, several root Hermes OpenViking sessions had synced messages but commit_count == 0 and memories_extracted.total == 0; manually running ov session commit <session_id> repaired them and produced extracted memories.
Expected behavior
For an expired gateway session using OpenViking:
- If a cached/running
AIAgent exists, the existing cleanup path should call shutdown_memory_provider() / MemoryProvider.on_session_end() and commit the OpenViking session.
- If no cached/running
AIAgent exists but OpenViking has been configured and turns may already have been synced, the gateway should still commit the OpenViking session directly by session_id, or otherwise leave the local session unfinalized so a later retry/repair path can handle it.
- The session should only be marked
expiry_finalized=True after the OpenViking commit/finalization step succeeds.
Actual behavior
The idle expiry watcher can reach a no-agent path. If the cached/running AIAgent is gone, _cleanup_agent_resources(agent) is not called, so the OpenViking provider's on_session_end() path does not run. The session can still be marked finalized locally, leaving the synced OpenViking session uncommitted forever.
Why this matters
OpenViking per-turn sync can succeed while long-term memory extraction silently fails. That is particularly confusing because message_count > 0 makes the session look stored, but the useful memory artifacts are never created because commit never ran.
This causes real memory loss for Telegram/gateway sessions that expire while idle or after process/cache lifecycle changes.
Suggested minimal fix
In gateway/run.py, inside _session_expiry_watcher:
- Preserve the existing behavior when a cached/running agent exists.
- If no cached/running agent exists and OpenViking is configured, perform a tiny provider-specific fallback commit:
client.post(f"/api/v1/sessions/{session_id}/commit")
using the configured OpenViking endpoint/account/user/agent.
- Only set
entry.expiry_finalized = True if cleanup or fallback commit succeeds.
- If fallback commit fails, do not save the session as finalized, so the next watcher sweep can retry.
Regression tests to add
Add coverage in tests/gateway/test_session_boundary_hooks.py:
-
Expired gateway session, OpenViking configured, no cached/running AIAgent, fallback commit succeeds:
- fallback commit called once with the expired session id
expiry_finalized becomes True
-
Same setup but fallback commit fails:
expiry_finalized remains False
- session store is not saved as finalized
Notes
This is not meant to replace the generic lifecycle fixes from #14981 or #15165. It is a defensive OpenViking-specific fallback for orphaned gateway sessions where turns were already synced but the provider object is gone before idle-expiry finalization runs.
Bug description
Gateway idle session expiry can mark a session as finalized even though an OpenViking-backed memory session was never committed. This leaves OpenViking with synced turn messages but no session commit/extraction, so memories are never indexed and the gateway will not retry because the local session is already marked finalized.
This looks like a follow-up edge case to #14981 rather than the same bug. #14981 fixed firing
on_session_finalizeon idle expiry, but this case concerns OpenVikingMemoryProvider.on_session_end()/ session commit when the cachedAIAgentor provider instance is unavailable by the time the expiry watcher runs.Related but distinct:
on_session_finalizehook dispatch on idle expiry.shutdown_memory_provider()./newand/resetnot triggering OpenViking commit.Observed behavior
After gateway idle expiry, some OpenViking sessions can be left in this state:
At the same time, the gateway logs/session store indicate the idle expiry sweep completed/finalized the session. Because the local session is finalized, the watcher does not retry and OpenViking extraction never happens.
In a real local audit, several root Hermes OpenViking sessions had synced messages but
commit_count == 0andmemories_extracted.total == 0; manually runningov session commit <session_id>repaired them and produced extracted memories.Expected behavior
For an expired gateway session using OpenViking:
AIAgentexists, the existing cleanup path should callshutdown_memory_provider()/MemoryProvider.on_session_end()and commit the OpenViking session.AIAgentexists but OpenViking has been configured and turns may already have been synced, the gateway should still commit the OpenViking session directly bysession_id, or otherwise leave the local session unfinalized so a later retry/repair path can handle it.expiry_finalized=Trueafter the OpenViking commit/finalization step succeeds.Actual behavior
The idle expiry watcher can reach a no-agent path. If the cached/running
AIAgentis gone,_cleanup_agent_resources(agent)is not called, so the OpenViking provider'son_session_end()path does not run. The session can still be marked finalized locally, leaving the synced OpenViking session uncommitted forever.Why this matters
OpenViking per-turn sync can succeed while long-term memory extraction silently fails. That is particularly confusing because
message_count > 0makes the session look stored, but the useful memory artifacts are never created becausecommitnever ran.This causes real memory loss for Telegram/gateway sessions that expire while idle or after process/cache lifecycle changes.
Suggested minimal fix
In
gateway/run.py, inside_session_expiry_watcher:using the configured OpenViking endpoint/account/user/agent.
entry.expiry_finalized = Trueif cleanup or fallback commit succeeds.Regression tests to add
Add coverage in
tests/gateway/test_session_boundary_hooks.py:Expired gateway session, OpenViking configured, no cached/running
AIAgent, fallback commit succeeds:expiry_finalizedbecomesTrueSame setup but fallback commit fails:
expiry_finalizedremainsFalseNotes
This is not meant to replace the generic lifecycle fixes from #14981 or #15165. It is a defensive OpenViking-specific fallback for orphaned gateway sessions where turns were already synced but the provider object is gone before idle-expiry finalization runs.