Summary
The dashboard (hermes dashboard --tui) leaks tui_gateway.slash_worker subprocesses when chat sessions end. Over hours/days these accumulate to the point that the launchd-default file-descriptor cap (256 on macOS) is exhausted, after which any new chat session spawn fails with OSError: [Errno 24] Too many open files from subprocess.Popen in hermes_cli/main.py:_make_tui_argv, and the user sees [session ended] immediately upon opening /chat.
Environment
- Hermes v0.13.0
- macOS Darwin 24.6.0 (arm64)
- Python 3.11.15
- Launched via launchd user agent (
ai.hermes.dashboard.plist installed by install-hermes / setup wizard)
launchctl limit maxfiles = 256 unlimited (the macOS default for launchd-spawned user agents — note this is much lower than the interactive-shell ulimit -n of 1048576)
Reproduction
- Install Hermes on macOS with the bundled launchd setup, no explicit
SoftResourceLimits.NumberOfFiles in the dashboard plist.
- Open
localhost:9119/chat in a browser, exchange a few turns, close the tab.
- Repeat over several hours / days (or many short sessions in succession).
- Observe:
tui_gateway.slash_worker processes persist after the parent session/tab ends — ps -eo pid,etime,command | grep tui_gateway shows accumulating PIDs.
- Once the per-process file-descriptor budget is exhausted, every new
/chat open shows [session ended] instantly, and the dashboard error log shows the OSError below.
Evidence
In my install I just observed 75+ orphan tui_gateway.slash_worker processes, the oldest with etime over 27 hours, and corresponding subprocess.Popen → os.pipe() → OSError: [Errno 24] in dashboard-launchd.err. Cleaning the orphans + restarting the dashboard restored /chat immediately. Kanban DB writes were also failing (hermes_dashboard_plugin_kanban: unable to open database file) — same FD-exhaustion symptom, not a DB-corruption issue.
Stack trace from dashboard-launchd.err:
File ".../hermes_cli/web_server.py", line 3146, in pty_ws
argv, cwd, env = _resolve_chat_argv(resume=resume, sidecar_url=sidecar_url)
File ".../hermes_cli/web_server.py", line 3051, in _resolve_chat_argv
argv, cwd = _make_tui_argv(PROJECT_ROOT / "ui-tui", tui_dev=False)
File ".../hermes_cli/main.py", line 1092, in _make_tui_argv
result = subprocess.run(...)
File ".../subprocess.py", line 1715, in _get_handles
c2pread, c2pwrite = os.pipe()
OSError: [Errno 24] Too many open files
Suggested fix
Two-part fix, both needed:
- Reap
tui_gateway.slash_worker (and tui_gateway.entry) processes when their session ends. The dashboard's WebSocket-close / session-end handler should track the spawned worker PID, send SIGTERM, wait briefly, then SIGKILL if it does not exit. Right now they are leaking silently. This is the root cause.
- Set
SoftResourceLimits.NumberOfFiles in the bundled ai.hermes.dashboard.plist template (and gateway plists, for parity) so freshly-installed dashboards have headroom even if any residual leak path remains. Suggested: 8192 soft / 16384 hard. This is a defense-in-depth measure.
Workaround (for users hitting this now)
pkill -KILL -f "tui_gateway\.slash_worker"
pkill -KILL -f "tui_gateway\.entry"
launchctl kickstart -k "gui/\$(id -u)/ai.hermes.dashboard"
Optionally add SoftResourceLimits.NumberOfFiles to ~/Library/LaunchAgents/ai.hermes.dashboard.plist and reload via launchctl bootout + launchctl load -w.
Summary
The dashboard (
hermes dashboard --tui) leakstui_gateway.slash_workersubprocesses when chat sessions end. Over hours/days these accumulate to the point that the launchd-default file-descriptor cap (256 on macOS) is exhausted, after which any new chat session spawn fails withOSError: [Errno 24] Too many open filesfromsubprocess.Popeninhermes_cli/main.py:_make_tui_argv, and the user sees[session ended]immediately upon opening/chat.Environment
ai.hermes.dashboard.plistinstalled byinstall-hermes/ setup wizard)launchctl limit maxfiles=256 unlimited(the macOS default for launchd-spawned user agents — note this is much lower than the interactive-shellulimit -nof 1048576)Reproduction
SoftResourceLimits.NumberOfFilesin the dashboard plist.localhost:9119/chatin a browser, exchange a few turns, close the tab.tui_gateway.slash_workerprocesses persist after the parent session/tab ends —ps -eo pid,etime,command | grep tui_gatewayshows accumulating PIDs./chatopen shows[session ended]instantly, and the dashboard error log shows the OSError below.Evidence
In my install I just observed 75+ orphan
tui_gateway.slash_workerprocesses, the oldest withetimeover 27 hours, and correspondingsubprocess.Popen→os.pipe()→OSError: [Errno 24]indashboard-launchd.err. Cleaning the orphans + restarting the dashboard restored/chatimmediately. Kanban DB writes were also failing (hermes_dashboard_plugin_kanban: unable to open database file) — same FD-exhaustion symptom, not a DB-corruption issue.Stack trace from
dashboard-launchd.err:Suggested fix
Two-part fix, both needed:
tui_gateway.slash_worker(andtui_gateway.entry) processes when their session ends. The dashboard's WebSocket-close / session-end handler should track the spawned worker PID, send SIGTERM, wait briefly, then SIGKILL if it does not exit. Right now they are leaking silently. This is the root cause.SoftResourceLimits.NumberOfFilesin the bundledai.hermes.dashboard.plisttemplate (and gateway plists, for parity) so freshly-installed dashboards have headroom even if any residual leak path remains. Suggested: 8192 soft / 16384 hard. This is a defense-in-depth measure.Workaround (for users hitting this now)
Optionally add
SoftResourceLimits.NumberOfFilesto~/Library/LaunchAgents/ai.hermes.dashboard.plistand reload vialaunchctl bootout+launchctl load -w.