Skip to content

[Bug]: Dashboard/TUI "N sessions" footer count leaks — _sessions never pruned on WebSocket disconnect #38950

@aimodelprint-sudo

Description

@aimodelprint-sudo

Bug Description

On a long-lived gateway the TUI/dashboard footer's "N sessions" count grows
without bound and only resets when the gateway process restarts. The count is
len() of the in-memory _sessions registry returned by the session.active_list
RPC. Entries are added on every session create/resume, but the only removal is the
explicit session.close RPC — and a WebSocket disconnect never sends it. On
disconnect the gateway only rebinds the dead transport to stdio; the entry
survives forever. Each phantom entry also keeps a live AIAgent, slash-command
worker, and notification-poller thread, so it's a real memory/thread leak, not
just a misleading number.

Affects setups with a long-lived gateway + repeated dashboard/desktop reconnects
(every browser tab/refresh spawns a PTY that registers a session). Pure stdio-CLI
users won't see it.

Steps to Reproduce

  1. Start a long-lived gateway: hermes gateway run (or the supervised gateway / Hermes Desktop).
  2. Open the dashboard chat: hermes dashboard --tuihttp://127.0.0.1:9119/chat.
  3. Hard-refresh the chat tab a few times (or open several tabs, or let a flaky connection reconnect).
  4. Watch the "N sessions" count in the footer.

Expected Behavior

The count reflects genuinely live/attachable sessions (≈ the number of open
clients). Closing/refreshing a tab should not permanently increment it.

Actual Behavior

The count increments by 1 on every reconnect and never decreases, even though
only one client is connected (lsof -nP -iTCP:9119 shows a single ESTABLISHED
socket; gateway_state.json shows active_agents: 0). It only resets when the
gateway restarts. Memory and thread count climb in lockstep (visible in the
[MEMORY] ... threads=N monitor line).

Affected Component

Gateway (the tui_gateway JSON-RPC server backing the TUI / dashboard / desktop chat)

Root Cause Analysis

  • session.active_list (tui_gateway/server.py) returns one row per entry in the
    module-global _sessions: dict with no liveness filter.
  • The only _sessions.pop is the session.close RPC handler.
  • On WS disconnect, tui_gateway/ws.py's finally rebinds the dead transport to
    _stdio_transport (so later emits don't crash) but never removes the entry.
  • The client does not send session.close on a transient disconnect — and
    intentionally so: prompt.submit re-binds the transport on the next prompt
    ("Re-bind to the current client transport…"), so a session is meant to survive
    a transient drop and re-attach.
  • Net: every disconnect that isn't an explicit close leaks a permanent _sessions
    entry → active_list over-counts and the agent/worker/poller-thread are never
    reclaimed.

Proposed Fix

Because warm re-attach across a transient disconnect is intentional, popping on
disconnect isn't safe. Instead: stamp a detached_at timestamp on disconnect,
drop disconnected-idle sessions from active_list after a short grace window
(keeping the focused session), and add a lazily-started daemon reaper (mirroring
gateway/memory_monitor) that finalizes sessions that are not running and have
been detached + idle past the window — running the existing session.close
teardown and guarded by history_lock + a last_active check so it can't race
the notification poller / /goal loop. PR attached.

Operating System

macOS (Darwin) — but the bug is platform-independent (pure gateway-side logic).

Are you willing to submit a PR?

Yes — PR ready.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/gatewayGateway runner, session dispatch, deliverycomp/tuiTerminal UI (ui-tui/ + tui_gateway/)type/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions