Summary
Every time a user opens the /chat tab, _make_tui_argv() in hermes_cli/main.py runs a synchronous 15-second npm run build inside FastAPI's async event handler before spawning the PTY. This blocks the entire event loop, preventing WebSocket keepalives from being processed. Reverse proxies with WebSocket idle timeouts (Cloudflare, nginx, Traefik defaults) kill the connection before the build finishes, and the user sees [session ended] with no error message.
Root Cause
_hermes_ink_bundle_stale() checks for packages/hermes-ink/dist/ink-bundle.js:
# hermes_cli/main.py
def _hermes_ink_bundle_stale(tui_dir: Path) -> bool:
ink_root = tui_dir / "packages" / "hermes-ink"
bundle = ink_root / "dist" / "ink-bundle.js"
if not bundle.exists():
return True # <-- always True
...
But the @hermes/ink build script outputs dist/entry-exports.js, not dist/ink-bundle.js:
> @hermes/ink@0.0.1 build
> esbuild src/entry-exports.ts ... --outdir=dist
dist/entry-exports.js 418.8kb
ink-bundle.js is never created, so _hermes_ink_bundle_stale() always returns True → _tui_build_needed() always returns True → npm run build runs on every single chat session start.
Impact
- Chat is completely broken for anyone running the dashboard behind a reverse proxy with WebSocket idle timeouts shorter than ~15s (Cloudflare Tunnel default: ~100s, but practical idle timeout is lower; nginx default proxy_read_timeout: 60s)
- Fails silently — no error is surfaced to the user, the PTY just never starts
- The build also runs synchronously in FastAPI's async event loop (
subprocess.run with capture_output=True), blocking all other WebSocket/HTTP traffic during the build
Reproduction
from pathlib import Path
from hermes_cli.main import _tui_build_needed
print(_tui_build_needed(Path("/opt/hermes/ui-tui"))) # True, always
Confirmed on a fresh container start — packages/hermes-ink/dist/ink-bundle.js does not exist in the shipped image.
Workaround
Set HERMES_TUI_DIR=/opt/hermes/ui-tui in the dashboard container's environment. This activates the fast path in _make_tui_argv() which checks only for dist/entry.js existence and skips _tui_build_needed() entirely:
if not tui_dev:
ext_dir = os.environ.get("HERMES_TUI_DIR")
if ext_dir:
p = Path(ext_dir)
if (p / "dist" / "entry.js").exists() and not _tui_need_npm_install(p):
return [node, str(p / "dist" / "entry.js")], p # no build check
Suggested Fix
Either:
- Fix
_hermes_ink_bundle_stale() to check for dist/entry-exports.js instead of dist/ink-bundle.js
- Pre-build the ink bundle in the Docker image so the file is present on startup
- Run the build asynchronously (via
asyncio.create_subprocess_exec) so it doesn't block the event loop
Option 3 is the most defensive regardless of the stale-check fix.
Summary
Every time a user opens the
/chattab,_make_tui_argv()inhermes_cli/main.pyruns a synchronous 15-secondnpm run buildinside FastAPI's async event handler before spawning the PTY. This blocks the entire event loop, preventing WebSocket keepalives from being processed. Reverse proxies with WebSocket idle timeouts (Cloudflare, nginx, Traefik defaults) kill the connection before the build finishes, and the user sees[session ended]with no error message.Root Cause
_hermes_ink_bundle_stale()checks forpackages/hermes-ink/dist/ink-bundle.js:But the
@hermes/inkbuild script outputsdist/entry-exports.js, notdist/ink-bundle.js:ink-bundle.jsis never created, so_hermes_ink_bundle_stale()always returnsTrue→_tui_build_needed()always returnsTrue→npm run buildruns on every single chat session start.Impact
subprocess.runwithcapture_output=True), blocking all other WebSocket/HTTP traffic during the buildReproduction
Confirmed on a fresh container start —
packages/hermes-ink/dist/ink-bundle.jsdoes not exist in the shipped image.Workaround
Set
HERMES_TUI_DIR=/opt/hermes/ui-tuiin the dashboard container's environment. This activates the fast path in_make_tui_argv()which checks only fordist/entry.jsexistence and skips_tui_build_needed()entirely:Suggested Fix
Either:
_hermes_ink_bundle_stale()to check fordist/entry-exports.jsinstead ofdist/ink-bundle.jsasyncio.create_subprocess_exec) so it doesn't block the event loopOption 3 is the most defensive regardless of the stale-check fix.