Skip to content

fix(dashboard): resolve chat TUI argv off event loop#26124

Open
0xdany wants to merge 2 commits into
NousResearch:mainfrom
0xdany:fix/issue-25351-skip-current-tui-build
Open

fix(dashboard): resolve chat TUI argv off event loop#26124
0xdany wants to merge 2 commits into
NousResearch:mainfrom
0xdany:fix/issue-25351-skip-current-tui-build

Conversation

@0xdany

@0xdany 0xdany commented May 15, 2026

Copy link
Copy Markdown

What does this PR do?

Dashboard chat now resolves its TUI launch command off the FastAPI/WebSocket event loop. The resolver can run npm install / npm run build through _make_tui_argv(), and doing that synchronously in /api/pty can block proxy keepalives and other dashboard WebSocket work long enough for reverse-proxy deployments to drop the chat connection.

This keeps the current TUI build policy intact: normal production launches still run the correctness-first npm run build path, while HERMES_TUI_DIR remains the prebuilt/no-build path for distros and containers. The change only moves the potentially slow resolver work to a worker thread for the dashboard chat path.

I initially looked at skipping no-op TUI rebuilds, but main intentionally simplified that logic in c6ca11618 to avoid fragile mtime-based staleness checks:

  1. HERMES_TUI_DIR set: use prebuilt dist/entry.js, no build.
  2. --dev: run TypeScript sources directly, no production build.
  3. Normal path: always npm run build for correctness.

This PR does not reintroduce bundle staleness caching. It addresses the dashboard-specific failure mode instead: blocking work inside the WebSocket handler's event loop.

Related Issue

Fixes #25351

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • hermes_cli/web_server.py: add _resolve_chat_argv_async() and call it from the /api/pty WebSocket handler.
  • hermes_cli/web_server.py: run the synchronous resolver via asyncio.to_thread() so the dashboard event loop stays responsive while Node/npm work completes.
  • hermes_cli/web_server.py: serialize resolver calls with an asyncio.Lock before dispatching to the worker thread. This preserves one-build-at-a-time behavior when multiple chat tabs connect, without occupying default threadpool workers while queued connections wait.
  • tests/hermes_cli/test_web_server.py: add coverage that the resolver is dispatched through the async wrapper and that /api/pty uses that wrapper.

How to Test

  1. Confirm formatting:
    • git diff --check
  2. Run focused regression coverage:
    • uv run --with pytest --with pytest-xdist pytest tests/hermes_cli/test_web_server.py::TestPtyWebSocket::test_resolve_chat_argv_async_uses_worker_thread tests/hermes_cli/test_web_server.py::TestPtyWebSocket::test_pty_ws_resolves_argv_through_async_wrapper -q
  3. Run the broader dashboard/TUI launcher slice:
    • uv run --with pytest --with pytest-xdist --with ptyprocess pytest tests/hermes_cli/test_web_server.py tests/hermes_cli/test_tui_npm_install.py -q

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 26.3.1

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

Validation results:

Check Result
git diff --check pass
uv run --with pytest --with pytest-xdist pytest tests/hermes_cli/test_web_server.py::TestPtyWebSocket::test_resolve_chat_argv_async_uses_worker_thread tests/hermes_cli/test_web_server.py::TestPtyWebSocket::test_pty_ws_resolves_argv_through_async_wrapper -q 2 passed
uv run --with pytest --with pytest-xdist --with ptyprocess pytest tests/hermes_cli/test_web_server.py tests/hermes_cli/test_tui_npm_install.py -q 156 passed

SystemExit behavior is preserved: if _make_tui_argv() exits because Node/npm is unavailable or the TUI build fails, asyncio.to_thread() propagates that back to the WebSocket handler and the existing error response path still runs.

@alt-glitch alt-glitch added type/perf Performance improvement or optimization P2 Medium — degraded but workaround exists comp/tui Terminal UI (ui-tui/ + tui_gateway/) labels May 15, 2026
@0xdany 0xdany force-pushed the fix/issue-25351-skip-current-tui-build branch from d39f954 to dadfd2c Compare May 15, 2026 15:22
@0xdany 0xdany changed the title fix: skip TUI rebuild when bundle is current fix(dashboard): resolve chat TUI argv off event loop May 15, 2026
@austinpickett austinpickett requested a review from Copilot May 18, 2026 14:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Moves potentially slow chat TUI argv resolution (including any npm install / npm run build triggered by _make_tui_argv()) off the FastAPI /api/pty WebSocket event loop to prevent dashboard WebSocket stalls and reverse-proxy disconnects.

Changes:

  • Added an async _resolve_chat_argv_async() wrapper that dispatches synchronous argv resolution via asyncio.to_thread().
  • Updated /api/pty WebSocket handler to await the async resolver instead of calling the synchronous resolver directly.
  • Added tests asserting that argv resolution is routed through the async wrapper and uses asyncio.to_thread().

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
hermes_cli/web_server.py Adds async argv resolver and uses it in /api/pty to keep the event loop responsive during Node/npm work.
tests/hermes_cli/test_web_server.py Adds unit coverage ensuring the websocket path uses the async resolver and that it dispatches via asyncio.to_thread().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread hermes_cli/web_server.py Outdated

@austinpickett austinpickett left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use PULL_REQUEST_TEMPLATE.md, address copilot comments.

@0xdany 0xdany force-pushed the fix/issue-25351-skip-current-tui-build branch from dadfd2c to 7aa57db Compare May 19, 2026 06:11
@0xdany

0xdany commented May 19, 2026

Copy link
Copy Markdown
Author

Addressed the requested changes:

- Updated the PR body to follow .github/PULL_REQUEST_TEMPLATE.md.
- Addressed Copilot's threadpool concern by replacing the worker-thread-held threading.Lock with an asyncio.Lock around the await asyncio.to_thread(...) call. Queued chat tabs now wait in async context instead of occupying default ThreadPoolExecutor workers while waiting for the build/argv resolver slot.
- Re-ran validation:
  - git diff --check
  - uv run --with pytest --with pytest-xdist pytest tests/hermes_cli/test_web_server.py::TestPtyWebSocket::test_resolve_chat_argv_async_uses_worker_thread tests/hermes_cli/test_web_server.py::TestPtyWebSocket::test_pty_ws_resolves_argv_through_async_wrapper -q -> 2 passed
  - uv run --with pytest --with pytest-xdist --with ptyprocess pytest tests/hermes_cli/test_web_server.py tests/hermes_cli/test_tui_npm_install.py -q -> 156 passed

Ready for re-review.

@austinpickett

austinpickett commented May 19, 2026

Copy link
Copy Markdown
Collaborator

Addressed the requested changes:

- Updated the PR body to follow .github/PULL_REQUEST_TEMPLATE.md.
- Addressed Copilot's threadpool concern by replacing the worker-thread-held threading.Lock with an asyncio.Lock around the await asyncio.to_thread(...) call. Queued chat tabs now wait in async context instead of occupying default ThreadPoolExecutor workers while waiting for the build/argv resolver slot.
- Re-ran validation:
  - git diff --check
  - uv run --with pytest --with pytest-xdist pytest tests/hermes_cli/test_web_server.py::TestPtyWebSocket::test_resolve_chat_argv_async_uses_worker_thread tests/hermes_cli/test_web_server.py::TestPtyWebSocket::test_pty_ws_resolves_argv_through_async_wrapper -q -> 2 passed
  - uv run --with pytest --with pytest-xdist --with ptyprocess pytest tests/hermes_cli/test_web_server.py tests/hermes_cli/test_tui_npm_install.py -q -> 156 passed

Ready for re-review.

Beautiful, let's wait for the build to pass and I'll approve. Also please resolve any addressed conversations

@0xdany 0xdany force-pushed the fix/issue-25351-skip-current-tui-build branch from 7aa57db to 71cbab5 Compare May 19, 2026 15:46
@0xdany

0xdany commented May 19, 2026

Copy link
Copy Markdown
Author

@austinpickett quick update: the addressed Copilot conversation is resolved, and I rebased the PR onto current main to pick up the web i18n type fix that was causing the unrelated Docker/Nix failures.

Current PR head is now 71cbab50a414f08d15baf1b4bcf34a53716f849c.

Validation re-run locally after the rebase:

  • git diff --check
  • uv run --with pytest --with pytest-xdist --with ptyprocess pytest tests/hermes_cli/test_web_server.py tests/hermes_cli/test_tui_npm_install.py -q -> 156 passed

CI should be rerunning on the updated head now. Thanks!

@0xdany

0xdany commented May 19, 2026

Copy link
Copy Markdown
Author

@austinpickett quick follow-up: the remaining attribution failure was due to my commit email not being present in scripts/release.py AUTHOR_MAP.

I added the mapping:

New PR head: 2196c00f2

Validation re-run locally:

  • git diff --check
  • python3 -m py_compile scripts/release.py
  • uv run --with pytest --with pytest-xdist --with ptyprocess pytest tests/hermes_cli/test_web_server.py tests/hermes_cli/test_tui_npm_install.py -q -> 156 passed

The earlier supply-chain failure appears to be a stale-base false positive from the workflow comparing 7552e0f..71cbab50, which included an unrelated main-branch change to skills/productivity/google-workspace/scripts/setup.py. The actual PR diff before this attribution mapping was only hermes_cli/web_server.py and tests/hermes_cli/test_web_server.py; after this push it additionally includes only this AUTHOR_MAP line. CI should rerun on the new head now.

@0xdany

0xdany commented May 19, 2026

Copy link
Copy Markdown
Author

@austinpickett one more CI note: the latest push (2196c00f2) appears to be in GitHub Actions action_required state for the fork PR, so gh pr checks reports no checks on the current head yet.

The visible supply-chain failure is from the previous head (71cbab50), not the latest head. I reproduced that old scanner locally: the only high-signal match was skills/productivity/google-workspace/scripts/setup.py, which was outside this PR’s intended diff and came from the stale comparison range used by that old run.

Could you approve/run the workflows for the latest head when you get a chance? The current PR diff is limited to:

  • hermes_cli/web_server.py
  • tests/hermes_cli/test_web_server.py
  • scripts/release.py (AUTHOR_MAP entry only)

@0xdany 0xdany requested a review from austinpickett May 20, 2026 04:40
austinpickett
austinpickett previously approved these changes May 20, 2026

@austinpickett austinpickett left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix(dashboard): resolve chat TUI argv off event loop

3 files, +80/-1. Fixes blocking subprocess calls (npm install/npm run build) inside an async WebSocket handler that froze the event loop.

(item 1) Fix is correct — asyncio.Lock + asyncio.to_thread() moves blocking work off the event loop while serializing concurrent builds. Matches the right asyncio pattern.

(item 2) Tests are good — one verifies the async wrapper delegates via to_thread, one verifies pty_ws calls the async path instead of the old sync one.

(item 3) Minor scope creep: author-map entry in release.py — one line, zero risk.

Verdict: Approve ✅ — Real bug, correct fix, good tests.

Reviewed by Hermes Agent

@0xdany

0xdany commented May 20, 2026

Copy link
Copy Markdown
Author

@austinpickett thanks for the approval. Could you please rerun/approve the checks when you get a chance?

The remaining blockers appear to be CI state rather than code review:

  • Scan PR for critical supply chain risks is failing from the stale-base comparison; the flagged install-hook file is outside this PR's actual diff.
  • test was cancelled before completion.

The current PR diff is still limited to:
hermes_cli/web_server.py

  • tests/hermes_cli/test_web_server.py
  • scripts/release.py (AUTHOR_MAP entry only)

Local focused validation is still green: 162 passed for tests/hermes_cli/test_web_server.py and tests/hermes_cli/test_tui_npm_install.py.

@0xdany 0xdany force-pushed the fix/issue-25351-skip-current-tui-build branch from 2196c00 to 9dfea8d Compare May 22, 2026 00:13
@0xdany 0xdany force-pushed the fix/issue-25351-skip-current-tui-build branch from 9dfea8d to 49def88 Compare May 22, 2026 16:43
@0xdany 0xdany requested a review from austinpickett May 22, 2026 23:19

@austinpickett austinpickett left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — Approved ✅

Tight, well-reasoned fix. _resolve_chat_argv() can run npm install / npm run build via _make_tui_argv(), and calling it synchronously inside the /api/pty WebSocket handler blocked the dashboard event loop long enough for reverse proxies to drop the chat connection. Moving that to a worker thread is the right call.

What I verified:

  • SystemExit propagation — confirmed sys.exit(1) from _make_tui_argv (raised when node/npm is missing) still propagates cleanly through asyncio.to_thread() and is caught by the existing except SystemExit block. No regression to the error path.
  • Lock placement_chat_argv_lock is acquired before dispatching to the threadpool, so queued chat tabs preserve one-build-at-a-time behavior without occupying worker threads while waiting. Correct ordering.
  • Scope — does not reintroduce mtime/staleness caching removed in c6ca11618; only relocates where the resolver runs. Build policy intact.
  • Tests — both new tests are meaningful (dispatch-through-to_thread + /api/pty routes through the async wrapper).

Local results:

pytest tests/hermes_cli/test_web_server.py tests/hermes_cli/test_tui_npm_install.py -q
→ 146 passed, 16 skipped (PTY/platform-gated)
git diff --check → clean

Nothing critical or warning-level. Narrowly scoped, accurate description, good coverage. LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tui Terminal UI (ui-tui/ + tui_gateway/) P2 Medium — degraded but workaround exists type/perf Performance improvement or optimization

Projects

None yet

4 participants