Skip to content

Dashboard Chat tab unusable in Docker image — four compounding bugs (ui-tui chown, /opt/data file chown, HOME inheritance, ink-bundle staleness loop) #20739

@jbellsolutions

Description

@jbellsolutions

Upstream issue draft — Hermes Docker image: Chat tab unusable due to four bugs

File this against NousResearch/hermes-agent once reviewed. Run from this directory:

gh issue create --repo NousResearch/hermes-agent \
  --title "Dashboard Chat tab unusable in Docker image — four compounding bugs (ui-tui chown, /opt/data file chown, HOME inheritance, ink-bundle staleness loop)" \
  --body-file docs/upstream-issue-chat-tab-eacces.md

Summary

In the official nousresearch/hermes-agent:latest Docker image (verified on v0.12.0 (2026.4.30)), the dashboard's embedded Chat tab (hermes dashboard --tui) is unusable out of the box due to four compounding bugs. Each one alone is enough to break the chat; fixing them in isolation produces a different failure mode each time, which made this take hours to diagnose.

TL;DR of the four bugs, in the order they surface as you fix them:

  1. /opt/hermes/ui-tui/ is root-owned → TUI rebuild fails with EACCES → banner reads "Chat unavailable: 1"
  2. Files inside /opt/data/ (notably auth.json mode 0600) are root-owned → auth store unreadable → Models/Providers picker is empty and WARNING hermes_cli.auth: failed to parse /opt/data/auth.json [Errno 13] spams the logs
  3. gosu preserves HOME=/root from Docker's default root start → TUI subprocess (node ui-tui/dist/entry.js) inherits HOME=/root, can't write to ~/.hermes, and silently produces no output — the Chat tab renders as a blank dark canvas with just a cursor blinking in the top-left corner
  4. _hermes_ink_bundle_stale looks for ink-bundle.js but the build produces entry-exports.js → bundle is always considered stale → every Chat tab connection triggers a synchronous npm run build && tsc && npm run build:compile chain on the asyncio event loop, blocking the dashboard for minutes per connection, while the rebuild never produces the file the check is looking for, so the next connection does the same thing again. WebSocket handshake to /api/pty times out before accept() is reached.

After all four are worked around, chat works correctly (WebSocket connects in < 1s, slash workers spawn, OpenRouter/DeepSeek calls succeed, ink TUI renders).


Bug #1/opt/hermes/ui-tui/ is root-owned

The dashboard runs as the unprivileged hermes user (after the entrypoint drops privileges via gosu), but /opt/hermes/ui-tui/ and its dist/ directories are baked into the image as root:root. When _make_tui_argv() tries to rebuild the TUI bundle on first run, esbuild fails to write to /opt/hermes/ui-tui/packages/hermes-ink/dist/entry-exports.js with EACCES, the build aborts, _make_tui_argv calls sys.exit(1), and web_server.pty_ws reports Chat unavailable: {SystemExit(1)} over the WebSocket.

Bug #2/opt/data files are root-owned (entrypoint chown is non-recursive)

The entrypoint's needs_chown check only fires when the top-level $HERMES_HOME directory has the wrong owner — root-owned files inside /opt/data are left untouched. We hit this on /opt/data/auth.json (mode 0600), which made the auth store unreadable for the hermes user:

WARNING hermes_cli.auth: auth: failed to parse /opt/data/auth.json
([Errno 13] Permission denied: '/opt/data/auth.json') —
starting with empty store. Corrupt file preserved at /opt/data/auth.json.corrupt

This warning repeats roughly once per second forever. Symptom in the dashboard: the Models / Providers picker is empty because the auth store has no readable entries (even though OPENROUTER_API_KEY is set in the environment and the credential is reachable via env-pool fallback for direct CLI calls).

Bug #3gosu preserves Docker's default HOME=/root

This was the killer. After fixing #1 and #2, the Chat tab still rendered blank — a dark canvas with just a cursor in the top-left, no spinner, no banner, no error.

Root cause: Docker starts the container as root by default, with HOME=/root from the env. When the entrypoint runs exec gosu hermes "$0" "$@", gosu by default preserves the parent's environment, including HOME. So the gateway and dashboard processes run as uid=hermes but with HOME=/root — a directory the hermes user has no read or write access to.

The TUI subprocess (node /opt/hermes/ui-tui/dist/entry.js) writes to ~/.hermes/ for skin caches, history, ink-cli state, etc. Every write fails silently, ink renders nothing, the WebSocket pumps an empty PTY stream, and the Chat tab shows a blank xterm.

Verified by inspecting /proc/<dashboard-pid>/environ:

HOME=/root
HERMES_HOME=/opt/data

Workaround: set HOME=/opt/data (the hermes user's actual passwd-listed homedir) in the container env. Then the chat works end-to-end.

Bug #4_hermes_ink_bundle_stale looks for a file that never exists, triggering a synchronous rebuild on every chat connection

This was the killer of killers. After working around bugs #1#3, the Chat tab still hangs: the WebSocket to /api/pty opens a TCP connection, sends the upgrade request, and the server returns zero bytes — for minutes — until the browser times out.

Root cause: _make_tui_argv calls _tui_build_needed_hermes_ink_bundle_stale, which checks for /opt/hermes/ui-tui/packages/hermes-ink/dist/ink-bundle.js. That file does not ship in the image (only entry-exports.js does, after bug #1 is worked around). So _hermes_ink_bundle_stale always returns True_tui_build_needed always returns True_make_tui_argv always runs subprocess.run([npm, "run", "build"], capture_output=True) synchronously.

That subprocess.run blocks the asyncio event loop for the duration of the build (npm + tsc + esbuild bundling — multiple minutes). During that time, every HTTP request and WebSocket upgrade on the dashboard hangs. The browser hits its 10-second WebSocket open timeout long before the build finishes.

To make matters worse: when the build does eventually complete, it still doesn't produce ink-bundle.js (the build script targets entry-exports.js). So the next connection starts the same rebuild from scratch.

Verified by capturing the dashboard's child process tree mid-hang:

hermes      40   /opt/hermes/.venv/bin/hermes dashboard --host 0.0.0.0 --tui
hermes     955   sh -c npm run build --prefix packages/hermes-ink && tsc -p tsconfig.build.json && npm run build:compile && chmod +x dist/entry.js

And by checking the file the staleness check is looking for:

$ ls -la /opt/hermes/ui-tui/packages/hermes-ink/dist/
-rw-r--r-- entry-exports.js     # <- exists
                                # <- ink-bundle.js missing!

Workaround: setting HERMES_TUI_DIR=/opt/hermes/ui-tui in the container env makes _make_tui_argv take its prebuilt-bundle shortcut (which checks dist/entry.js, not ink-bundle.js) and skip the rebuild path entirely. The chat then connects in < 1s.

Reproduction

  1. Pull and run the official image with the dashboard in --tui mode:
    services:
      hermes:
        image: nousresearch/hermes-agent:latest
        command: ["/bin/bash", "-c", "hermes dashboard --host 0.0.0.0 --port 9119 --no-open --insecure --tui & exec hermes gateway run --accept-hooks"]
        ports: ["127.0.0.1:18789:9119"]
        environment:
          HERMES_INFERENCE_PROVIDER: openrouter
          HERMES_INFERENCE_MODEL: deepseek/deepseek-v4-flash
          OPENROUTER_API_KEY: <key>
  2. Open http://localhost:18789 → Chat tab.
  3. Banner shows: Chat unavailable: 1

Confirmed root cause

Run _make_tui_argv directly as the hermes user:

docker exec -u hermes <container> bash -c '\
  export HOME=/opt/data && \
  /opt/hermes/.venv/bin/python3 -c \
  "from hermes_cli.main import _make_tui_argv, PROJECT_ROOT; \
   print(_make_tui_argv(PROJECT_ROOT / \"ui-tui\", tui_dev=False))"'

Output (abridged):

TUI build failed.
> hermes-tui@0.0.1 build
> npm run build --prefix packages/hermes-ink && tsc -p tsconfig.build.json && ...

✘ [ERROR] Failed to write to output file:
   open /opt/hermes/ui-tui/packages/hermes-ink/dist/entry-exports.js: permission denied

ls -la /opt/hermes/ui-tui/dist /opt/hermes/ui-tui/packages/hermes-ink/dist shows both directories owned by root:root in the shipped image, while the running process is uid=hermes.

Combined workaround (compose-level)

All three bugs can be papered over without touching the image, by:

  1. Mounting an entrypoint wrapper that chowns both paths as root before chaining to the upstream entrypoint
  2. Setting HOME=/opt/data in the container env

Wrapper script (mounted at /init-chown.sh):

#!/bin/bash
set -e
chown -R hermes:hermes /opt/hermes/ui-tui 2>/dev/null || true
chown -R hermes:hermes /opt/data 2>/dev/null || true
exec /opt/hermes/docker/entrypoint.sh "$@"

Compose snippet:

services:
  hermes:
    image: nousresearch/hermes-agent:latest
    environment:
      HOME: /opt/data                          # <-- bug #3 workaround
      HERMES_TUI_DIR: /opt/hermes/ui-tui       # <-- bug #4 workaround
      # ... other env vars
    volumes:
      - ./bin/init-chown.sh:/init-chown.sh:ro
    entrypoint: ["/init-chown.sh"]
    command: ["/bin/bash", "/start-hermes.sh"]

After all three workarounds are in place, the chat works end-to-end. Without all three, you get one of the three failure modes above (banner, empty picker, or blank canvas).

Suggested upstream fixes

Any of these would resolve the bugs they correspond to. Ideally all three areas get patched:

For bug #1 (ui-tui chown):

  • Dockerfile: RUN chown -R hermes:hermes /opt/hermes/ui-tui after the build stage that produces dist/.
  • Or: tighten _tui_need_npm_install / _tui_build_needed so a fresh image with a complete prebuilt dist/ doesn't trigger a rebuild attempt at all. The entry.js is already present and runnable; the rebuild is only needed in dev mode.

For bug #2 (/opt/data file chown):

  • Entrypoint: change the needs_chown check from a top-level stat to a recursive ownership check, OR just always run chown -R hermes:hermes "$HERMES_HOME" (cheap on a small data dir).
  • Or: tighten the Dockerfile so anything baked into /opt/data ships with the right ownership.

For bug #3 (HOME inheritance):

  • Entrypoint: pass --preserve-env=PATH etc. but explicitly reset HOME before the gosu drop:
    export HOME=$(getent passwd hermes | cut -d: -f6)
    exec gosu hermes "$0" "$@"
    Or use gosu hermes -H if a similar flag exists, or runuser -l hermes which runs in a login shell and re-evaluates HOME from passwd.
  • Or: set HOME=/opt/data in the Dockerfile via ENV HOME=/opt/data (simple, image-level fix).

For bug #4 (ink-bundle staleness loop):

  • Pick one of these — there are several reasonable fixes:
    • Make the package's build target produce ink-bundle.js (or whatever name _hermes_ink_bundle_stale looks for). The mismatch between the build output (entry-exports.js) and the staleness check looks like a rename that wasn't propagated.
    • Or: change _hermes_ink_bundle_stale to look at the file the build actually produces.
    • Or: set ENV HERMES_TUI_DIR=/opt/hermes/ui-tui in the Dockerfile so the prebuilt-bundle shortcut is used by default in the official image.
  • Independent of the above, the rebuild path should not block the asyncio event loop. _resolve_chat_argv is called from inside pty_ws before await ws.accept(). If a build is genuinely needed, it should run via asyncio.to_thread / loop.run_in_executor, with a "Building TUI bundle…" message streamed to the client. Today, every connection that triggers it freezes the entire dashboard process for the build's duration.

Bonus: better error surfacing

Two of the three bugs were essentially silent, which made this take far longer to diagnose than it should have:

  1. pty_ws in hermes_cli/web_server.py catches SystemExit and renders Chat unavailable: {exc} — which becomes the unhelpful Chat unavailable: 1. Surfacing the underlying npm/esbuild stderr (or at least a hint like "TUI build failed; see container logs") would have shaved hours off debugging.

  2. The blank-canvas mode (bug Architecture planning #3) is the worst — the WebSocket connects, the PTY spawns, the node process runs, and nothing complains. There's no banner, no log line, no DevTools error. Adding a startup log from the TUI on whether ~/.hermes is writable, or a one-shot probe in _resolve_chat_argv that fails fast if HOME isn't writable for the current user, would catch this immediately.

Environment

  • Image: nousresearch/hermes-agent:latest (v0.12.0 (2026.4.30))
  • Host: Debian 12 / Docker 26.x on a DigitalOcean droplet
  • Compose: docker compose v2
  • Started via hermes dashboard --tui in foreground+background pattern (dashboard bg, hermes gateway run fg)

Related code paths

  • _DASHBOARD_EMBEDDED_CHAT_ENABLED gates /api/pty (works correctly)
  • _make_tui_argv in hermes_cli/main.py (the failing path)
  • pty_ws in hermes_cli/web_server.py (catches SystemExit and returns the unhelpful : 1 banner)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existsarea/dockerDocker image, Compose, packagingcomp/tuiTerminal UI (ui-tui/ + tui_gateway/)sweeper:implemented-on-mainSweeper: behavior already present on current maintype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions