Skip to content

fix(gateway): fire MemoryProvider.on_session_end on session expiry#11410

Closed
zerone0x wants to merge 1 commit into
NousResearch:mainfrom
zerone0x:fix/gateway-on-session-end-expiry
Closed

fix(gateway): fire MemoryProvider.on_session_end on session expiry#11410
zerone0x wants to merge 1 commit into
NousResearch:mainfrom
zerone0x:fix/gateway-on-session-end-expiry

Conversation

@zerone0x

Copy link
Copy Markdown
Contributor

Fixes #11205

The gateway session-expiry code path shut sessions down without invoking MemoryProvider.on_session_end(), so memory providers never got a chance to flush state for expired sessions. This patch invokes the hook on the expiry path, mirroring the graceful-end behavior.

What

_flush_memories_for_session (gateway/run.py:743) is the convergence point for all three gateway session-end paths:

  1. Idle-timeout expiry from _session_expiry_watcher (:2046)
  2. Daily scheduled reset (same watcher)
  3. Explicit /reset from a platform handler (:4343, :6422)

Until now it only spawned a separate flush AIAgent to nudge the builtin memory tool — it never called memory_manager.on_session_end(history) on the live cached agent's memory manager. As a result, plugin MemoryProviders that implement on_session_end as their documented final-pass extraction hook fired exactly zero times on gateway platforms (per the bug report log: many periodic / pre_compress triggers, zero session_end).

This patch dispatches the hook before the bespoke flush runs, looking up the live agent in _agent_cache first (where idle agents live) and falling back to _running_agents (in case the agent is still mid-turn when expiry fires). The dispatch is wrapped in try/except so a misbehaving provider can't block the rest of the flush sequence — matching the per-provider error tolerance that already exists inside MemoryManager.on_session_end.

This is the same structural class of bug as #7193 / #7192: the hook contract exists, the dispatch layer works, but one caller was missing.

How

# gateway/run.py — _flush_memories_for_session, after the cron-skip guard
try:
    cached_agent = ...  # _agent_cache[session_key] -> _running_agents[session_key]
    if cached_agent is not None:
        _mm = getattr(cached_agent, "_memory_manager", None)
        if _mm is not None:
            _msgs = [...]  # cleaned transcript, role+content only
            _mm.on_session_end(_msgs)
except Exception as exc:
    logger.warning("MemoryProvider.on_session_end dispatch failed for session %s: %s", ...)

The bespoke flush-agent path is left fully intact (it addresses an orthogonal concern tracked in #6157).

Testing

New tests/gateway/test_flush_memory_session_end.py (6 tests, all passing) covers:

  • Hook fires when the agent is in _agent_cache (idle path)
  • Hook fires when the agent is in _running_agents (mid-turn fallback)
  • No-op when no agent is cached for the session_key
  • Provider failure does not break the bespoke flush downstream
  • Cron sessions still bypass the entire flush (including the new hook)
  • No session_key passed -> no cached-agent lookup, hook not called

Existing related tests (test_async_memory_flush.py, test_flush_memory_stale_guard.py, test_memory_provider.py) continue to pass — 82/82 green.

Test plan

  • New unit tests for the dispatch + edge cases
  • Existing memory-flush + memory-provider tests still pass
  • Manual verification on a live gateway: install a plugin that logs from on_session_end, let an idle session expire, confirm the log line now appears

🤖 Generated with Claude Code

The gateway's session-end paths (idle expiry, scheduled reset, /reset)
all converge on _flush_memories_for_session, which spawned a bespoke
flush AIAgent but never invoked MemoryProvider.on_session_end on the
live cached agent's memory manager. Plugin providers that implement
on_session_end as their final-pass extraction hook (per the ABC
docstring) therefore never fired on gateway platforms.

Dispatch the hook on the cached agent (looked up in _agent_cache, with
a fallback to _running_agents when the agent is mid-turn) before the
bespoke flush runs, mirroring the contract honored by the CLI graceful
shutdown path in run_agent.shutdown_memory_provider. Wrapped in
try/except so a misbehaving provider can't block the rest of the flush
sequence.

Fixes #11205
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery tool/memory Memory tool and memory providers labels Apr 25, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #11304 — both fire memory session-end hooks on gateway expiry. Note: merged #15132 already addresses the on_session_finalize lifecycle gap; check if this is fully superseded.

@teknium1

Copy link
Copy Markdown
Contributor

Thanks for this fix, @zerone0x — the root cause you identified was real.

This PR is superseded by the flush_memories refactor that landed in the interim:

  • _flush_memories_for_session was removed entirely in commit ea01bdc (refactor(memory): remove flush_memories entirely #15696, merged 2026-04-25). The bespoke flush-agent path this PR targeted no longer exists on main.
  • on_session_end already fires on expiry via the current _session_expiry_watcher (gateway/run.py:2235): it calls _cleanup_agent_resources(cached_agent)agent.shutdown_memory_provider() (run.py:1666-1667) → _memory_manager.on_session_end() (run_agent.py:4140). The call chain you intended to add is already present.
  • Your test baseline files (test_async_memory_flush.py, test_flush_memory_stale_guard.py) were also deleted in the same refactor, so the PR's tests would need to be rebuilt against the new architecture.
  • Separately, PR fix(gateway): fire on_session_finalize on idle expiry (salvage #13756) #15132 (merged 2026-04-24) closed the on_session_finalize plugin-hook gap on the same expiry path.

The PR's head is now dirty against main for these reasons. Closing as implemented. If you spot a remaining gap (e.g. on_session_end receiving an empty message list rather than the actual transcript), that would be worth a fresh, narrowly-scoped follow-up.

This is an automated hermes-sweeper review.

@teknium1 teknium1 closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists tool/memory Memory tool and memory providers type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: MemoryProvider.on_session_end() never called on gateway session expiry

3 participants