Skip to content

[Bug]: TUI slash worker subprocesses leak after session close, accumulating zombie processes #38095

@ttxtq1014

Description

@ttxtq1014

Bug Description

When a TUI session is closed (user exits Hermes or closes the TUI tab), the _SlashWorker subprocess spawned during _init_session() is not cleaned up. These orphaned processes accumulate every time the TUI is opened and closed while the dashboard process remains running.

Observed on user machine: 5 zombie slash workers, each ~13MB RSS, two sharing the same session-key (indicating _restart_slash_worker also leaks — old process not killed before starting a new one).

Zombie processes found:

PID    RSS %MEM ELAPSED  COMMAND
83740  12896  0.1 01:02:27 python -m tui_gateway.slash_worker --session-key 20260602_182842_28ddd6
83743  12896  0.1 01:02:27 python -m tui_gateway.slash_worker --session-key 20260602_173931_fb4c4970
83752  12880  0.1 01:02:12 python -m tui_gateway.slash_worker --session-key 20260602_182842_28ddd6
83758  12896  0.1 01:02:07 python -m tui_gateway.slash_worker --session-key 20260602_172321_13f22a4e
83764  12896  0.1 01:02:04 python -m tui_gateway.slash_worker --session-key 20260602_170252_75c29b

Steps to Reproduce

  1. Run hermes (opens TUI) — a _SlashWorker subprocess is spawned
  2. Use any / slash command
  3. Close the TUI (Ctrl+C, or close the terminal tab)
  4. Repeat steps 1-3 several times
  5. Run ps aux | grep slash_worker — orphaned workers remain

Root Cause

_finalize_session() in tui_gateway/server.py:287 does not close the slash_worker subprocess:

def _finalize_session(session, end_reason="tui_close"):
    ...
    # Commits memory and updates DB, but does NOT close session["slash_worker"]

The slash worker IS properly closed in:

  • session.close JSON-RPC handler (line 3502-3527) ✅ — calls _finalize_session() first, then separately closes worker
  • _shutdown_sessions() (line 326-334) ✅ — but only runs when dashboard exits

But _finalize_session() itself does NOT close the worker — so any code path that calls _finalize_session() without separately handling the worker will leak.

Additionally, two workers with the same session-key (20260602_182842_28ddd6) were observed alive simultaneously, suggesting _restart_slash_worker() may also leak under certain conditions.

Expected Behavior

When a TUI session ends, the associated _SlashWorker subprocess should be reliably terminated.

Proposed Fix

In _finalize_session(), add slash worker cleanup:

def _finalize_session(session, end_reason="tui_close"):
    ...
    worker = session.get("slash_worker")
    if worker:
        try:
            worker.close()
        except Exception:
            pass

Also consider making _SlashWorker.close() more robust (longer wait() timeout, fallback kill()) and adding a guard against concurrent _restart_slash_worker calls.

Environment

  • Hermes: v0.15.1 / v0.15.2
  • OS: macOS
  • Python: 3.11.15

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/tuiTerminal UI (ui-tui/ + tui_gateway/)type/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions