fix(tui-gateway): close sidecar sessions on disconnect#21401
fix(tui-gateway): close sidecar sessions on disconnect#21401qWaitCrypto wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a sidecar slash-worker process leak (#21370) by allowing session.create callers to opt into closing the session (and its slash_worker) when the websocket transport disconnects. Normal TUI sessions retain reconnect semantics.
Changes:
- Add
close_on_disconnectflag, coercion helper, and a shared_close_session_by_id/_close_sessions_for_transportteardown path intui_gateway/server.py; refactorsession.closeand_shutdown_sessionsto use it. - Wire WS disconnect cleanup in
tui_gateway/ws.pyto the new helper. - Have the dashboard
ChatSidebaropt intoclose_on_disconnect: truewhen creating its sidecar session, and add regression tests for the new helpers.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tui_gateway/server.py | Adds flag plumbing, shared close helper, and transport-scoped close that distinguishes sidecar vs. persistent sessions. |
| tui_gateway/ws.py | Replaces inline transport-detach loop with the new _close_sessions_for_transport helper. |
| web/src/components/ChatSidebar.tsx | Marks dashboard sidecar sessions as close_on_disconnect: true. |
| tests/test_tui_gateway_server.py | Adds tests for flag storage and disconnect handling (sidecar closed, normal session preserved). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Three-layer defence-in-depth against orphaned slash_worker subprocesses: 1. **Server-side cleanup (P0)** — when a WebSocket disconnects, sessions marked close_on_disconnect=true (sidecar/short-lived) are finalised and their worker is killed. Normal TUI sessions still fall back to the stdio transport for historical reconnect compatibility. (Design from NousResearch#21401, adapted with permission.) 2. **Parent watchdog (P1, psutil)** — a daemon thread monitors the parent's PID + create_time fingerprint every 10 s and exits if the parent disappears. Handles crashes, SIGKILL, and PID reuse (critical on Windows). (Design from NousResearch#22863, adapted with permission.) 3. **Idle timeout + getppid() poll (P1, no deps)** — the main stdin loop uses select.select() with a 60 s timeout so it can periodically check os.getppid() and a 30-minute idle deadline. Works without psutil, adding defence even when the watchdog thread is not available. Also refactors session teardown into a shared _close_session_by_id() helper so that session.close RPC, WebSocket-disconnect cleanup, and server shutdown all use the identical code path. Co-authored-by: Hermes Agent
|
Superseded by
That closes the |
What does this PR do?
Fixes the dashboard sidecar session lifecycle leak behind
#21370.The underlying issue is not that
tui_gateway.slash_workerforgets to exit on its own. That subprocess is intentionally persistent for real TUI sessions so slash commands can reuse the same worker. The leak happens because dashboard sidecar sessions created over/api/wscan outlive their websocket connection: on disconnect, the server previously only detached the transport and kept the session record alive intui_gateway.server._sessions, which also kept the session'sslash_workeralive.This PR adds an explicit
close_on_disconnectsession mode for short-lived sidecar sessions, closes those sessions server-side when their websocket disconnects, and leaves the historical reconnect behavior unchanged for normal TUI sessions.Related Issue
Fixes #21370
Type of Change
Changes Made
close_on_disconnectparsing and storage intui_gateway/server.pyforsession.create._close_session_by_id()helper sosession.close, websocket disconnect cleanup, and process shutdown all use the same teardown path for finalize hooks, agent cleanup, notify unregistration, and slash-worker shutdown._close_sessions_for_transport()so websocket disconnect cleanup only closes explicitly marked sidecar sessions and still falls back to stdio for normal sessions.tui_gateway/ws.pyto use the shared transport/session cleanup helper.web/src/components/ChatSidebar.tsxto create its sidecar session withclose_on_disconnect: true.session.createstoring the close-on-disconnect flagHow to Test
python -m pytest tests/test_tui_gateway_server.py -q -k "close_on_disconnect or close_sessions_for_transport or session_close_commits_memory_and_fires_finalize_hook or session_create_close_race_does_not_orphan_worker or session_create_no_race_keeps_worker_alive"cd web && npm run buildstart
hermes dashboard, open the dashboard chat/sidebar, disconnect/close the websocket client, and confirm sidecarpython -m tui_gateway.slash_workerprocesses do not accumulate for the disconnected sidecar session.Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/AFor New Skills
N/A
Screenshots / Logs
6 passed in 20.47scd web && npm run build✓ built in 51.47s