Skip to content

[Bug]: gateway session:end event not emitted from idle-expiry watcher or auto-reset path #28746

@mbs-vhs

Description

@mbs-vhs

Bug Description

The gateway-level session:end hook event is emitted from exactly one call site (gateway/run.py:7667, inside _handle_reset_command). Two other paths that terminate a gateway session both close the session locally and fire the plugin-level on_session_finalize hook, but neither emits the gateway-level session:end event:

  1. Idle-expiry watchergateway/run.py:_session_expiry_watcher (line 3666). Runs every 5 min by default. Fires on_session_finalize plugin hook ([Bug]: The on_session_finalize hook is not being fired when gateway sessions expire due to configured idle time. #14981 fix), evicts the cached agent, sets entry.expiry_finalized = True. Does not emit gateway session:end.
  2. Auto-reset branchgateway/session.py:get_or_create_session (lines 893-905, where _should_reset is consulted and a stale session is closed before a new one is created for the same session_key). Calls self._db.end_session(...) against local SQLite. Does not emit gateway session:end.

So session:end only fires for explicit /new / /reset. Any session closed by idle-expiry, suspended-state reset, or daily-reset-policy turnover silently disappears from the perspective of any gateway hook subscriber.

Documentation states (gateway/hooks.py line 12):

session:end -- Session ends (user ran /new or /reset)

…which appears to document the bug as intentional, but in practice external observers expect session:end to fire whenever a session ends — symmetric with session:start.

Why this matters / impact

Any hook in ~/.hermes/hooks/ that subscribes to session:end (intended to: close mirror rows in an external DB, finalize logging, persist a transcript checkpoint, notify a remote observer, etc.) silently misses every idle-expiry- and auto-reset-driven close. The hook keeps cached state for those sessions forever.

In the specific install this was caught in, an external substrate-ingest hook subscribing to session:start / agent:start / agent:end / session:end produced orphan running rows in a downstream session-tracking DB: 3 cases confirmed across 3 days of operation, all on the idle-expiry path (sessions.json shows expiry_finalized: true, while the hook's local state.json still holds the substrate row id — indicating session:end never fired). The substrate-side cleanup needed an out-of-band SQL UPDATE.

The asymmetry is also confusing for hook authors: per the doc on on_session_finalize (plugin-level), on_session_finalize correctly fires on idle-expiry (PR #1725 / issue #14981). But the gateway-level session:end event — documented as the gateway-side counterpart — does not. Hook authors who subscribe only to session:end will be surprised.

Reproduction steps

  1. Install a gateway-level hook in ~/.hermes/hooks/<name>/HOOK.yaml subscribing to session:start + session:end. Handler logs both events with the session_id.
  2. Start a gateway session (e.g., DM the agent on Telegram).
  3. Path A — idle-expiry: stop messaging. Wait for the configured reset window (default ~30 min for gateway.session.reset_after) PLUS one watcher tick (5 min). Observe: the watcher loop logs Session expiry finalized for <session_id>. ~/.hermes/sessions/sessions.json shows expiry_finalized: true for the session_key. The hook log shows the session:start from earlier but no session:end.
  4. Path B — auto-reset: start a session, let it idle past the reset window, then send a new message that triggers auto-reset (different from /new). The new turn produces a new session_id for the same session_key. The hook log shows a new session:start for the new session_id but no session:end for the OLD session_id.

Expected behavior

session:end fires whenever a session ends, regardless of which path closed it. Specifically:

  • Idle-expiry watcher: after entry.expiry_finalized = True is set, emit session:end with the now-closed session_id.
  • Auto-reset in get_or_create_session: before (or as part of) emitting session:start for the new session_id, emit session:end for the OLD session_id that was just closed.

This makes session:end symmetric with session:start and unblocks external observers from tracking session lifecycle correctly.

Actual behavior

session:end is silent on both paths. External observers see open-ended session:start events that never close.

Affected code

  • gateway/run.py ≈ line 3666 — _session_expiry_watcher (loop body around line 3706-3743)
  • gateway/run.py ≈ line 6525-6541 — _handle_message_with_agent, the _is_new_session branch (where session:start fires for auto-reset; this is also where session:end for the OLD session_id should fire on auto-reset)
  • gateway/session.py ≈ line 893-905, 914-926 — get_or_create_session auto-reset branch + SessionEntry construction (needs to plumb the prior session_id so the caller in run.py knows what to emit)
  • gateway/hooks.py line 12 — docstring update to reflect the new contract (session:end fires on all session ends, not just /new / /reset)

Hermes version observed

hermes-agent 0.13.0 (per ~/.hermes/hermes-agent/pyproject.toml).

Related history

Suggested fix (sketch)

Three small edits, all additive (no existing call sites change behavior; the explicit /new path remains as-is):

  1. gateway/run.py:_session_expiry_watcher — after entry.expiry_finalized = True (line 3742), emit session:end with session_id, session_key, derived platform, and reason: "idle_expiry". Wrap in try/except so a misbehaving subscriber can't break the watcher (mirrors the existing try/except pass around the on_session_finalize plugin invoke at line 3712).
  2. gateway/session.py:get_or_create_session — when the auto-reset branch fires (line 901, was_auto_reset = True), capture entry.session_id (already done as db_end_session_id) and store it on the new SessionEntry as a transient field auto_reset_prior_session_id. Add this field to the SessionEntry dataclass (after auto_reset_reason at line 459). No persistence needed — it's consumed once by the next caller.
  3. gateway/run.py:_handle_message_with_agent — in the _is_new_session branch (line 6535), before emitting session:start, check getattr(session_entry, "auto_reset_prior_session_id", None) — if set, emit session:end for that prior session_id with reason: "auto_reset", then clear the field to make it idempotent.

The emit payload shape should match the existing session:end shape at line 7667 (platform, user_id, session_key), plus session_id so subscribers can disambiguate close events for the same session_key over time. Also recommend adding reason so subscribers can distinguish idle_expiry from auto_reset from manual_reset (the existing /new emit could add "reason": "manual_reset" for symmetry).

PR

Happy to submit a PR — opening one in parallel with this issue.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/gatewayGateway runner, session dispatch, deliverytype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions