Bug Description
On a long-lived gateway the TUI/dashboard footer's "N sessions" count grows
without bound and only resets when the gateway process restarts. The count is
len() of the in-memory _sessions registry returned by the session.active_list
RPC. Entries are added on every session create/resume, but the only removal is the
explicit session.close RPC — and a WebSocket disconnect never sends it. On
disconnect the gateway only rebinds the dead transport to stdio; the entry
survives forever. Each phantom entry also keeps a live AIAgent, slash-command
worker, and notification-poller thread, so it's a real memory/thread leak, not
just a misleading number.
Affects setups with a long-lived gateway + repeated dashboard/desktop reconnects
(every browser tab/refresh spawns a PTY that registers a session). Pure stdio-CLI
users won't see it.
Steps to Reproduce
- Start a long-lived gateway:
hermes gateway run (or the supervised gateway / Hermes Desktop).
- Open the dashboard chat:
hermes dashboard --tui → http://127.0.0.1:9119/chat.
- Hard-refresh the chat tab a few times (or open several tabs, or let a flaky connection reconnect).
- Watch the "N sessions" count in the footer.
Expected Behavior
The count reflects genuinely live/attachable sessions (≈ the number of open
clients). Closing/refreshing a tab should not permanently increment it.
Actual Behavior
The count increments by 1 on every reconnect and never decreases, even though
only one client is connected (lsof -nP -iTCP:9119 shows a single ESTABLISHED
socket; gateway_state.json shows active_agents: 0). It only resets when the
gateway restarts. Memory and thread count climb in lockstep (visible in the
[MEMORY] ... threads=N monitor line).
Affected Component
Gateway (the tui_gateway JSON-RPC server backing the TUI / dashboard / desktop chat)
Root Cause Analysis
session.active_list (tui_gateway/server.py) returns one row per entry in the
module-global _sessions: dict with no liveness filter.
- The only
_sessions.pop is the session.close RPC handler.
- On WS disconnect,
tui_gateway/ws.py's finally rebinds the dead transport to
_stdio_transport (so later emits don't crash) but never removes the entry.
- The client does not send
session.close on a transient disconnect — and
intentionally so: prompt.submit re-binds the transport on the next prompt
("Re-bind to the current client transport…"), so a session is meant to survive
a transient drop and re-attach.
- Net: every disconnect that isn't an explicit close leaks a permanent
_sessions
entry → active_list over-counts and the agent/worker/poller-thread are never
reclaimed.
Proposed Fix
Because warm re-attach across a transient disconnect is intentional, popping on
disconnect isn't safe. Instead: stamp a detached_at timestamp on disconnect,
drop disconnected-idle sessions from active_list after a short grace window
(keeping the focused session), and add a lazily-started daemon reaper (mirroring
gateway/memory_monitor) that finalizes sessions that are not running and have
been detached + idle past the window — running the existing session.close
teardown and guarded by history_lock + a last_active check so it can't race
the notification poller / /goal loop. PR attached.
Operating System
macOS (Darwin) — but the bug is platform-independent (pure gateway-side logic).
Are you willing to submit a PR?
Yes — PR ready.
Bug Description
On a long-lived gateway the TUI/dashboard footer's "N sessions" count grows
without bound and only resets when the gateway process restarts. The count is
len()of the in-memory_sessionsregistry returned by thesession.active_listRPC. Entries are added on every session create/resume, but the only removal is the
explicit
session.closeRPC — and a WebSocket disconnect never sends it. Ondisconnect the gateway only rebinds the dead transport to stdio; the entry
survives forever. Each phantom entry also keeps a live
AIAgent, slash-commandworker, and notification-poller thread, so it's a real memory/thread leak, not
just a misleading number.
Affects setups with a long-lived gateway + repeated dashboard/desktop reconnects
(every browser tab/refresh spawns a PTY that registers a session). Pure stdio-CLI
users won't see it.
Steps to Reproduce
hermes gateway run(or the supervised gateway / Hermes Desktop).hermes dashboard --tui→http://127.0.0.1:9119/chat.Expected Behavior
The count reflects genuinely live/attachable sessions (≈ the number of open
clients). Closing/refreshing a tab should not permanently increment it.
Actual Behavior
The count increments by 1 on every reconnect and never decreases, even though
only one client is connected (
lsof -nP -iTCP:9119shows a single ESTABLISHEDsocket;
gateway_state.jsonshowsactive_agents: 0). It only resets when thegateway restarts. Memory and thread count climb in lockstep (visible in the
[MEMORY] ... threads=Nmonitor line).Affected Component
Gateway (the
tui_gatewayJSON-RPC server backing the TUI / dashboard / desktop chat)Root Cause Analysis
session.active_list(tui_gateway/server.py) returns one row per entry in themodule-global
_sessions: dictwith no liveness filter._sessions.popis thesession.closeRPC handler.tui_gateway/ws.py'sfinallyrebinds the dead transport to_stdio_transport(so later emits don't crash) but never removes the entry.session.closeon a transient disconnect — andintentionally so:
prompt.submitre-binds the transport on the next prompt("Re-bind to the current client transport…"), so a session is meant to survive
a transient drop and re-attach.
_sessionsentry →
active_listover-counts and the agent/worker/poller-thread are neverreclaimed.
Proposed Fix
Because warm re-attach across a transient disconnect is intentional, popping on
disconnect isn't safe. Instead: stamp a
detached_attimestamp on disconnect,drop disconnected-idle sessions from
active_listafter a short grace window(keeping the focused session), and add a lazily-started daemon reaper (mirroring
gateway/memory_monitor) that finalizes sessions that are not running and havebeen detached + idle past the window — running the existing
session.closeteardown and guarded by
history_lock+ alast_activecheck so it can't racethe notification poller /
/goalloop. PR attached.Operating System
macOS (Darwin) — but the bug is platform-independent (pure gateway-side logic).
Are you willing to submit a PR?
Yes — PR ready.