Skip to content

Dashboard: leaked tui_gateway.slash_worker processes accumulate, exhaust file descriptors on macOS launchd #24775

@paulb26

Description

@paulb26

Summary

The dashboard (hermes dashboard --tui) leaks tui_gateway.slash_worker subprocesses when chat sessions end. Over hours/days these accumulate to the point that the launchd-default file-descriptor cap (256 on macOS) is exhausted, after which any new chat session spawn fails with OSError: [Errno 24] Too many open files from subprocess.Popen in hermes_cli/main.py:_make_tui_argv, and the user sees [session ended] immediately upon opening /chat.

Environment

  • Hermes v0.13.0
  • macOS Darwin 24.6.0 (arm64)
  • Python 3.11.15
  • Launched via launchd user agent (ai.hermes.dashboard.plist installed by install-hermes / setup wizard)
  • launchctl limit maxfiles = 256 unlimited (the macOS default for launchd-spawned user agents — note this is much lower than the interactive-shell ulimit -n of 1048576)

Reproduction

  1. Install Hermes on macOS with the bundled launchd setup, no explicit SoftResourceLimits.NumberOfFiles in the dashboard plist.
  2. Open localhost:9119/chat in a browser, exchange a few turns, close the tab.
  3. Repeat over several hours / days (or many short sessions in succession).
  4. Observe: tui_gateway.slash_worker processes persist after the parent session/tab ends — ps -eo pid,etime,command | grep tui_gateway shows accumulating PIDs.
  5. Once the per-process file-descriptor budget is exhausted, every new /chat open shows [session ended] instantly, and the dashboard error log shows the OSError below.

Evidence

In my install I just observed 75+ orphan tui_gateway.slash_worker processes, the oldest with etime over 27 hours, and corresponding subprocess.Popenos.pipe()OSError: [Errno 24] in dashboard-launchd.err. Cleaning the orphans + restarting the dashboard restored /chat immediately. Kanban DB writes were also failing (hermes_dashboard_plugin_kanban: unable to open database file) — same FD-exhaustion symptom, not a DB-corruption issue.

Stack trace from dashboard-launchd.err:

File ".../hermes_cli/web_server.py", line 3146, in pty_ws
  argv, cwd, env = _resolve_chat_argv(resume=resume, sidecar_url=sidecar_url)
File ".../hermes_cli/web_server.py", line 3051, in _resolve_chat_argv
  argv, cwd = _make_tui_argv(PROJECT_ROOT / "ui-tui", tui_dev=False)
File ".../hermes_cli/main.py", line 1092, in _make_tui_argv
  result = subprocess.run(...)
File ".../subprocess.py", line 1715, in _get_handles
  c2pread, c2pwrite = os.pipe()
OSError: [Errno 24] Too many open files

Suggested fix

Two-part fix, both needed:

  1. Reap tui_gateway.slash_worker (and tui_gateway.entry) processes when their session ends. The dashboard's WebSocket-close / session-end handler should track the spawned worker PID, send SIGTERM, wait briefly, then SIGKILL if it does not exit. Right now they are leaking silently. This is the root cause.
  2. Set SoftResourceLimits.NumberOfFiles in the bundled ai.hermes.dashboard.plist template (and gateway plists, for parity) so freshly-installed dashboards have headroom even if any residual leak path remains. Suggested: 8192 soft / 16384 hard. This is a defense-in-depth measure.

Workaround (for users hitting this now)

pkill -KILL -f "tui_gateway\.slash_worker"
pkill -KILL -f "tui_gateway\.entry"
launchctl kickstart -k "gui/\$(id -u)/ai.hermes.dashboard"

Optionally add SoftResourceLimits.NumberOfFiles to ~/Library/LaunchAgents/ai.hermes.dashboard.plist and reload via launchctl bootout + launchctl load -w.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/tuiTerminal UI (ui-tui/ + tui_gateway/)duplicateThis issue or pull request already existstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions