Skip to content

[Bug]: Gateway idle expiry can finalize OpenViking sessions without commit when cached AIAgent is unavailable #19831

@elokus

Description

@elokus

Bug description

Gateway idle session expiry can mark a session as finalized even though an OpenViking-backed memory session was never committed. This leaves OpenViking with synced turn messages but no session commit/extraction, so memories are never indexed and the gateway will not retry because the local session is already marked finalized.

This looks like a follow-up edge case to #14981 rather than the same bug. #14981 fixed firing on_session_finalize on idle expiry, but this case concerns OpenViking MemoryProvider.on_session_end() / session commit when the cached AIAgent or provider instance is unavailable by the time the expiry watcher runs.

Related but distinct:

Observed behavior

After gateway idle expiry, some OpenViking sessions can be left in this state:

message_count > 0
pending_tokens > 0
commit_count == 0
memories_extracted.total == 0

At the same time, the gateway logs/session store indicate the idle expiry sweep completed/finalized the session. Because the local session is finalized, the watcher does not retry and OpenViking extraction never happens.

In a real local audit, several root Hermes OpenViking sessions had synced messages but commit_count == 0 and memories_extracted.total == 0; manually running ov session commit <session_id> repaired them and produced extracted memories.

Expected behavior

For an expired gateway session using OpenViking:

  1. If a cached/running AIAgent exists, the existing cleanup path should call shutdown_memory_provider() / MemoryProvider.on_session_end() and commit the OpenViking session.
  2. If no cached/running AIAgent exists but OpenViking has been configured and turns may already have been synced, the gateway should still commit the OpenViking session directly by session_id, or otherwise leave the local session unfinalized so a later retry/repair path can handle it.
  3. The session should only be marked expiry_finalized=True after the OpenViking commit/finalization step succeeds.

Actual behavior

The idle expiry watcher can reach a no-agent path. If the cached/running AIAgent is gone, _cleanup_agent_resources(agent) is not called, so the OpenViking provider's on_session_end() path does not run. The session can still be marked finalized locally, leaving the synced OpenViking session uncommitted forever.

Why this matters

OpenViking per-turn sync can succeed while long-term memory extraction silently fails. That is particularly confusing because message_count > 0 makes the session look stored, but the useful memory artifacts are never created because commit never ran.

This causes real memory loss for Telegram/gateway sessions that expire while idle or after process/cache lifecycle changes.

Suggested minimal fix

In gateway/run.py, inside _session_expiry_watcher:

  • Preserve the existing behavior when a cached/running agent exists.
  • If no cached/running agent exists and OpenViking is configured, perform a tiny provider-specific fallback commit:
client.post(f"/api/v1/sessions/{session_id}/commit")

using the configured OpenViking endpoint/account/user/agent.

  • Only set entry.expiry_finalized = True if cleanup or fallback commit succeeds.
  • If fallback commit fails, do not save the session as finalized, so the next watcher sweep can retry.

Regression tests to add

Add coverage in tests/gateway/test_session_boundary_hooks.py:

  1. Expired gateway session, OpenViking configured, no cached/running AIAgent, fallback commit succeeds:

    • fallback commit called once with the expired session id
    • expiry_finalized becomes True
  2. Same setup but fallback commit fails:

    • expiry_finalized remains False
    • session store is not saved as finalized

Notes

This is not meant to replace the generic lifecycle fixes from #14981 or #15165. It is a defensive OpenViking-specific fallback for orphaned gateway sessions where turns were already synced but the provider object is gone before idle-expiry finalization runs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/gatewayGateway runner, session dispatch, deliverycomp/pluginsPlugin system and bundled pluginstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions