fix(dashboard): authenticate server-spawned PTY child WS with a process-internal credential (gated mode) by benbarclay · Pull Request #37892 · NousResearch/hermes-agent

benbarclay · 2026-06-03T05:00:40Z

Problem

On every OAuth-gated hosted agent (auth_required=true — all Fly-provisioned agents with HERMES_DASHBOARD_TUI=1), the embedded dashboard Chat tab loads the TUI but the TUI immediately fails to reach its backend: the browser console spams gateway websocket connection failed and the status line shows gateway startup timeout. The dashboard otherwise reports healthy.

Root cause (verified live on staging)

The embedded-TUI PTY child attaches to two server-internal WebSockets, with URLs built server-side and handed to the child via its env:

/api/ws — its primary JSON-RPC gateway backend (_build_gateway_ws_url)
/api/pub — the event sidecar (_build_sidecar_url)

In gated mode, _ws_auth_ok unconditionally rejects the legacy ?token=<_SESSION_TOKEN> path (a leaked session token must not grant WS access once the gate is engaged — this is a deliberate security property). But _build_gateway_ws_url() had no gated-mode branch — it always emitted ?token=. Its sibling _build_sidecar_url had been given a ticket branch; the gateway-URL builder was simply missed.

Result: the TUI child's /api/ws upgrade is rejected 4401 → gateway websocket connection failed → after retries, gateway startup timeout. Live crash log captured the exact URL:

[tui-parent] [startup] timed out waiting for gateway.ready (python=websocket, cwd=ws://0.0.0.0:9119/api/ws?***)

Why not just mint a single-use ticket like the sidecar did?

A single-use, 30s-TTL browser ticket is the wrong shape for this link:

The Ink child reads HERMES_TUI_GATEWAY_URL once at startup and reuses it on every reconnect → a single-use ticket dies on the first reconnect.
On a slow cold boot (hosted shared-CPU agents take ~2.5 min), the ticket can expire before the child even dials.

_build_sidecar_url's own docstring already flagged this: "if reconnect semantics ever become important, this should be upgraded to a long-lived process-scoped token."

Fix

Add a process-lifetime, multi-use internal credential to dashboard_auth/ws_tickets.py:

internal_ws_credential() — minted once per process, stable, never expires, multi-use.
consume_internal_credential() — constant-time compare, not single-use (so the child can reconnect), returns a server-internal identity for audit logs.
Never injected into the SPA HTML or any REST response — it only leaves the process via a spawned child's env, so browser-side XSS can't read it. A leak grants no more than a single-use ticket already does (the same two internal WS endpoints), and the Origin/host guards still apply downstream.

_ws_auth_ok accepts it via ?internal= in gated mode only. Both _build_gateway_ws_url (the bug) and _build_sidecar_url (moved off its fragile single-use ticket) now use it. Loopback / --insecure behavior is unchanged (?token=).

Impact case

Concrete: the dashboard Chat tab is completely unusable on every OAuth-gated deployment — i.e. every hosted Fly agent. This isn't an exotic config edge; it's the default hosted path the moment embedded chat is enabled. Not a security regression (the gate still rejects the legacy token); it's a connectivity bug that left the primary /api/ws transport unauthenticatable for server-spawned children.

Verification

Unit: tests/hermes_cli/test_dashboard_auth_ws_auth.py — 45 pass (+8 new: internal-credential accepted, multi-use, wrong-value rejected, not-honored-in-loopback, gateway-url gated/loopback, gateway+sidecar share one credential). Full dashboard-auth + web_server suites: 300 pass, 1 skip.
Live e2e on staging (hermes-agent-stg-cmpxglzuq0003ju0cwxnpucq0): applied the change in-place, restarted the dashboard, and Ben confirmed the Chat tab now connects — no gateway websocket connection failed / gateway startup timeout. Crash log stayed empty; zero internal-credential rejections in the audit log. (Container then reverted to release code; the durable fix ships via this PR's image build.)

Note: a separate first-bug — the container doing a runtime npm install inside /api/pty (→ 502 → [session ended]) — is fixed by setting HERMES_TUI_DIR=/opt/hermes/ui-tui in the image env (Docker-image change, tracked separately).

Session-token audit (full sweep, per request)

Traced all 120 _SESSION_TOKEN / __HERMES_SESSION_TOKEN__ references. Everything else in the gated path is correct (REST middleware bypassed for cookie auth, api.ts fetchJSON falls back to cookies, the SPA's own gatewayClient.ts mints tickets, _serve_index omits the token). The TUI gateway WS was the only thing in the reported path still hard-depending on the session token.

Latent, out-of-scope (flagged for follow-up): the bundled dashboard plugins (kanban, hermes-achievements) read __HERMES_SESSION_TOKEN__ directly and don't set credentials:'include', so their /api/plugins/... REST calls would 401 in gated mode. Independent of this bug; likely not loaded on hosted agents.

Review note

Touches _ws_auth_ok + dashboard_auth (core auth surface) → needs Teknium review.

…ss-internal credential The embedded-TUI PTY child attaches to two server-internal WebSockets: /api/ws (its primary JSON-RPC gateway backend) and /api/pub (the event sidecar). Both URLs are built server-side in web_server.py and handed to the child via its environment. In OAuth-gated mode (auth_required=true, every hosted Fly agent), _ws_auth_ok unconditionally rejects the legacy ?token=<_SESSION_TOKEN> path — a leaked session token must not grant WS access once the gate is engaged. But _build_gateway_ws_url() still only emitted ?token=, with no gated-mode branch (its sibling _build_sidecar_url had been given a ticket branch; the gateway-url builder was missed). So the TUI child's /api/ws upgrade was rejected 4401 -> 'gateway websocket connection failed' -> 'gateway startup timeout', leaving the embedded chat unusable on every gated deployment. A single-use 30s browser ticket is the wrong shape for this link: the child reads its attach URL once at startup and reuses it on every reconnect, and on a slow cold boot it may not dial within the TTL. (_build_sidecar_url's own docstring already flagged this fragility.) Fix: add a process-lifetime, multi-use internal credential to dashboard_auth.ws_tickets (internal_ws_credential / consume_internal_credential), minted once per process and NEVER injected into the SPA — it only leaves the process via a spawned child's env, so browser-side XSS can't read it, and a leak grants no more than a ticket already does. _ws_auth_ok accepts it via ?internal= in gated mode only. Both _build_gateway_ws_url and _build_sidecar_url now use it, so the child can reconnect both sockets. Loopback / --insecure behavior is unchanged (still ?token=). Needs review: touches _ws_auth_ok + dashboard_auth (core auth surface).

github-actions · 2026-06-03T05:01:37Z

🔎 Lint report: `dashboard-internal-ws-credential` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9707 on HEAD, 9707 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 5029 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

…cstring fix Follow-up to Ben's PR #37892. Adds a TestInternalCredential block to test_dashboard_auth_ws_tickets.py exercising the mint-once stability, multi-use, unminted-rejection, empty-value, wrong-value, reset-and-remint, and ticket-store-independence branches directly (previously only covered indirectly via _ws_auth_ok, which left the unminted and empty-value branches unexercised). Also corrects the consume_internal_credential docstring: the returned identity dict is discarded by the current _ws_auth_ok caller (which only needs the boolean outcome), so the prior 'carry it into its session log' wording over-promised.

…cstring fix Follow-up to Ben's PR NousResearch#37892. Adds a TestInternalCredential block to test_dashboard_auth_ws_tickets.py exercising the mint-once stability, multi-use, unminted-rejection, empty-value, wrong-value, reset-and-remint, and ticket-store-independence branches directly (previously only covered indirectly via _ws_auth_ok, which left the unminted and empty-value branches unexercised). Also corrects the consume_internal_credential docstring: the returned identity dict is discarded by the current _ws_auth_ok caller (which only needs the boolean outcome), so the prior 'carry it into its session log' wording over-promised.

#69) * fix(tui): clear selection on right-click copy + group transcript blocks Two TUI polish fixes. (1) Right-click copy now clears the highlight. The right-click handler copied an active selection via onCopySelectionNoClear (the copy-on-select variant that keeps the highlight during a drag) and never cleared it, so after right-click-to-copy the selection stayed lit with no confirmation and a follow-up right-click re-copied the stale range instead of pasting. A successful right-click copy now clears the selection and notifies; if the copy fails (no clipboard path) the highlight survives and we fall back to the right-click paste handler, exactly as before. (2) Group transcript blocks so boundaries read clearly. Model replies, reasoning/tool trails, and system/error notes rendered with no vertical separation, so distinct block types butted together and were hard to scan. Group adjacent blocks by kind: one blank line opens only where the visual group changes (model prose <-> reasoning/tool trails <-> notes), while a run of same-kind blocks renders flush. The rule lives in domain/blockLayout.ts (messageGroup + hasLeadGap) and is applied intrinsically in MessageLine via a `prev` prop, which fixes the things ad-hoc per-block margins kept breaking: - Streaming stability: the gap is derived from the stable predecessor, never the live block's own changing text, so the actively-streaming reply computes the same gap while it streams as the settled segment does once it flushes. No reflow/jump. - Transparent empty trails: a trail hidden by /details, or one carrying only a token tally (the finalDetails segment message.complete appends), renders nothing and is transparent to grouping (prevRenderedMsg skips it), so there are no floating gaps, no doubled gap after a prompt, and no padded space above the final reply. In the default/collapsed modes content-bearing trails always render, so the grouping is a no-op there. The virtual-height estimator counts the group-boundary line so scroll math stays accurate before Yoga remeasures. ui-tui/src/domain/blockLayout.ts (new), components/messageLine.tsx, components/streamingAssistant.tsx, components/appLayout.tsx, lib/virtualHeights.ts, app/useMainApp.ts. Tests: blockLayout.test.ts (grouping + hidden/empty-trail visibility), virtualHeights leadGap, app-mouse.test.ts copy behavior. Full ui-tui suite green apart from 3 pre-existing local/env failures (cursorDrift, ink-resize, virtualHeights user-prompt-width) unchanged from main. * fix(desktop): stop chat scroll jumping by disabling native scroll anchoring The thread renders virtualized turns in natural document flow with padding spacers, and @tanstack/react-virtual already adjusts scrollTop itself when an off-screen turn is measured and its real height differs from the 220px estimate. With the browser default `overflow-anchor: auto`, native scroll anchoring corrects that SAME size delta too, so the two double-correct and the view lurches — most visibly with Windows mouse wheels, whose coarse notches mount/measure several under-estimated turns per tick (Mac trackpads scroll ~1-3px/frame, keeping it sub-perceptual). Set `overflow-anchor: none` on the thread viewport so only the virtualizer compensates. Also adds `diag-scroll-reset.mjs`, a CDP wheel-up repro that A/B tests the anchor behavior at runtime to confirm the fix. * feat(desktop): clamp sticky human messages to ~2 lines until hover/focus Long user prompts stick to the top of the thread while the response streams beneath them, so a multi-line prompt could eat most of the viewport. Clamp the read-only human bubble's text to ~2 lines with a soft bottom fade; the clamp lifts on hover or keyboard focus, and clicking the bubble still opens the edit composer (which shows the full text). Short messages are untouched — no clamp, no fade. Overflow is measured on an unclamped inner wrapper so the ResizeObserver only fires on real content/width changes, not every frame while the outer max-height animates open; the measured height feeds --human-msg-full so expand/collapse animate to the true height instead of overshooting the cap. * fix(dashboard): trust non-web WS origins on OAuth-gated binds after ticket auth (#37870) Generalises #37747. The WS Origin guard (_ws_host_origin_is_allowed) only trusted the packaged Electron app's non-web origin (file:// / null / app://) when the bind was NOT OAuth-gated. The packaged Hermes Desktop renderer loads over file://, so when it drives a remote OAuth-gated gateway its /api/ws upgrade was rejected with HTTP 403 even though _ws_auth_ok had already validated the single-use ?ticket= one line earlier. This guard runs only AFTER _ws_auth_ok has accepted the WS credential, which is the real auth boundary in every mode: * loopback bind -> legacy dashboard session token * non-loopback --insecure -> legacy session token (Tailscale / LAN, #37747) * OAuth-gated public bind -> single-use, 30s-TTL, identity-bound ?ticket= A non-web origin can only come from a native client; a DNS-rebinding attack always arrives from an http(s) origin and is still match-checked against the bound host. So once the upstream credential check has passed, the Origin guard adds nothing for a non-web origin. Collapsed the loopback/non-gated special cases to 'return True' for non-web origins. http(s) origins keep the strict same-host check, so browser DNS-rebinding defence is unchanged. Tests: gated file:///null/app:// now asserted ALLOWED; cross-site http(s) still rejected on gated and loopback binds; #37747's loopback and non-loopback-insecure cases retained. 37/37 test_dashboard_auth_ws_auth + test_web_server_host_header pass. * fix(desktop): inset sticky human messages with --sticky-human-top Pin user bubbles 0.75rem below the scroll top via a single token instead of flush top-0, so the sticky header doesn't sit hard against the thread edge. * chore: uptick * fix(desktop): drop sticky human clamp max-height transition * fix(desktop): restore sticky human clamp transition at 0.75s * chore: uptick * feat(desktop): custom zoom shortcuts at half default step Replace Electron's built-in zoomIn/zoomOut/resetZoom menu roles with custom implementations that use a 0.1 zoom-level step instead of Chromium's default 0.2. This makes Ctrl/Cmd + +/-0 zoom feel more granular and less jumpy. Also adds installZoomShortcuts() which intercepts the keyboard shortcuts via before-input-event. This is necessary on Linux/Windows where the application menu is set to null, so Chromium's default handler would otherwise apply the full 0.2 step. * fix(docker): seed gateway_state.json from HERMES_GATEWAY_BOOTSTRAP_STATE on first boot (#37896) On a fresh volume there is no gateway_state.json, so the boot reconciler (cont-init.d/02-reconcile-profiles) registers the gateway-default s6 slot but leaves it down — it only auto-starts when the last recorded state was "running". A freshly-provisioned container therefore comes up with the gateway down until something starts it (e.g. the dashboard's start button). Add a generic, first-boot-only env-seed in stage2-hook.sh (which runs before 02-reconcile-profiles): when HERMES_GATEWAY_BOOTSTRAP_STATE=running and no gateway_state.json exists yet, seed {"gateway_state":"running"} so the reconciler brings the supervised slot up on the very first boot. This mirrors the existing HERMES_AUTH_JSON_BOOTSTRAP pattern: it seeds the same state file the reconciler already consults, guarded by [ ! -f ] so persisted runtime state always wins on later boots (a deliberately-stopped gateway stays stopped across restarts). Only the literal "running" is honoured (the sole value in the reconciler's _AUTOSTART_STATES). Generic container contract — no host-specific code. Useful to any orchestrator that provisions a blank volume and wants the gateway up from first boot (the supervised gateway/dashboard already work on such hosts; only the first-boot autostart was missing because the CLI lifecycle commands can't drive the s6 layer when container self-detection misses). Adds a shell-level contract test and documents the env var. * fix(desktop): keep in-flight new chats from vanishing on refresh Creating several sessions in a row (Ctrl-N, type, send, repeat) and waiting for one to finish made the other still-running chats disappear from the sidebar. Root cause: a new session's first user message isn't flushed to the SessionDB until its turn is persisted, so the row's message_count stays 0 mid-response. `refreshSessions()` lists with min_messages=1 and then hard-replaces $sessions. Because every message.complete triggers a refresh, the moment one session finished, the others (still at message_count 0) were filtered out of the server page and dropped from the list. Fix: merge instead of replace. `mergeWorkingSessions()` preserves any session that is still in $workingSessionIds but absent from the server page, so concurrent new chats stay visible until their own turn persists. Optimistic deletes/archives already remove the row from the previous list, so a removed session can't be resurrected by the merge. * fix(desktop): label in-flight new chats with the first message The send path created the optimistic sidebar row with a null preview, so a new chat read "Untitled session" until its turn persisted and auto-title ran. With concurrent new chats now preserved across refreshes, several "Untitled session" rows could show at once. Seed the optimistic preview with the user's first message (the branch path already does this) so each in-flight row is labeled immediately. The server's own preview/title supersedes it once the turn persists. * fix(docker): point TUI launcher at prebuilt bundle via HERMES_TUI_DIR (#37923) The embedded dashboard Chat tab dies on hosted images with a 502 / "[session ended]": the PTY child's `hermes --tui` spawn runs a runtime `npm install` that fails. Root cause: the root package-lock.json describes the WHOLE npm monorepo workspace set (root + web + ui-tui + apps/*), but the image only installs root/web/ui-tui — apps/* (the desktop app) is never `npm install`ed here, and its deps hoist into the shared root node_modules. So the actualized node_modules permanently disagrees with the canonical lock, `_tui_need_npm_install()` returns True on every launch, and the runtime `npm install` it triggers (a) can never converge against the partial monorepo and (b) races itself across concurrent /api/pty connections -> ENOTEMPTY -> the launcher `sys.exit(1)`s, the slow install blows past Fly's WS-upgrade window -> 502 -> the browser shows "[session ended]". Fix: set `ENV HERMES_TUI_DIR=/opt/hermes/ui-tui` so `_make_tui_argv` takes the prebuilt-bundle fast path (`node --expose-gc /opt/hermes/ui-tui/dist/entry.js`) and never reaches the install check — exactly the nix/packaged-release path the launcher was designed for. The bundle is already built at Layer 8 (`ui-tui && npm run build`); this just tells the launcher to use it. Verified on a freshly-built image: HERMES_TUI_DIR is set, the prebuilt dist/entry.js is present, `_make_tui_argv` resolves to the prebuilt node invocation (no npm), and `docker run ... --tui` no longer prints "npm install failed". New regression guard: tests/docker/test_tui_prebuilt_bundle.py. A separate launcher hardening (make _tui_need_npm_install tolerant of partial-monorepo installs) is tracked independently; this Docker-side fix resolves the hosted-chat symptom on its own. Area: docker (Dockerfile + tests/docker). * fix(desktop): disable GPU acceleration on remote displays to stop flicker Users on remote/forwarded displays (SSH X11 forwarding, VNC, RDP, WSLg) reported the window flickering during scroll/streaming; nobody on native Windows/macOS ever saw it. Root cause: the app shipped with Chromium's default GPU hardware acceleration and no remote-display handling. Over a remote connection the GPU compositor can't present accelerated layers cleanly across the wire, so the surface flashes on repaint. Local sessions composite on the GPU and never hit it. Detect a remote display before app `ready` (detectRemoteDisplay in bootstrap-platform.cjs) and fall back to software rendering via app.disableHardwareAcceleration() + --disable-gpu-compositing. Software compositing is rock-steady over the wire and the CPU cost is negligible next to the connection's latency. HERMES_DESKTOP_DISABLE_GPU overrides detection both ways for VNC/screen-sharing setups we can't sniff or remote hosts that do have working acceleration. * fix(desktop): don't treat WSLg as a remote display WSLg renders Linux GUIs locally through a vGPU surface rather than shipping frames over the wire, so it doesn't show the remote-compositor flicker — confirmed by a WSL user seeing zero flickering. Drop the WSL branch from detectRemoteDisplay so WSLg keeps hardware acceleration; detection now covers only genuinely-remote displays (SSH X11 forwarding, VNC, RDP). The HERMES_DESKTOP_DISABLE_GPU override still works for anyone who does hit it. * fix(desktop): keep slash/@ completion menu navigable and Esc-dismissable The desktop composer's `onKeyUp` handler unconditionally re-ran `refreshTrigger` on every keyup, including the Arrow/Enter/Tab/Escape keys the open-trigger `onKeyDown` branch had already fully handled. Because `refreshTrigger` re-detects the trigger and resets the active index to 0, this produced two bugs in the `/` (and `@`) completion popover: - ArrowDown/ArrowUp moved the highlight on keydown, then keyup snapped it straight back to the top — so the user could never cycle past the first couple of items. - Escape closed the menu on keydown, then keyup re-detected the still-present `/` and immediately reopened it — so Esc appeared to do nothing. Fix: skip the keyup-driven refresh for the navigation/control keys while a trigger menu is open (they never edit text, so refreshing is pointless), and only reset the highlight in `refreshTrigger` when the detected trigger query actually changed. Applied to both the main composer (chat/composer/index.tsx) and the message-edit composer (assistant-ui/thread.tsx), which shared the same bug. New `shouldSkipTriggerRefreshOnKeyUp` helper is unit-tested. * fix(desktop): make Stop button actually interrupt when a turn is queued When a follow-up message is queued during a busy turn, the composer clears and the primary button switches back to the Stop affordance. But clicking Stop ran interruptAndSendNextQueued(), which cancelled the turn and *immediately* re-sent the head of the queue. The auto-drain effect (busy true to false) compounded this: any explicit cancel flipped busy false and re-fired the queue. The net effect was that Stop appeared to never interrupt -- the agent kept running on the queued prompt. Fix: - Stop button (busy + empty composer) now always performs a pure interrupt via onCancel(); it no longer hijacks the queue. - An explicit interrupt latches userInterruptedRef so the busy to false auto-drain skips exactly one drain. Queued turns are preserved and the user resumes them deliberately (Cmd/Ctrl+K, Enter, or the per-row send-now arrow), matching the documented Esc=cancel / Cmd+K=send-next affordances. - Extracted the settle decision into shouldAutoDrainOnSettle() with unit tests covering natural completion vs. explicit interrupt. * fix(desktop): stop background session messages bleeding into the active transcript A still-busy background session (one the user toggled away from) keeps emitting updateSessionState() heartbeats — stream deltas, and especially the 'session busy' prompt-rejection errors from auto-drained queued turns. Each call invoked syncSessionStateToView() unconditionally, staging that session's messages into the shared $messages view. flushPendingViewState() guarded against the wrong session reaching the view, but only one requestAnimationFrame is scheduled per frame and pendingViewStateRef holds just the latest writer. So within a single frame a background write could overwrite an already-pending foreground write, and the stale background transcript (e.g. the red 'session busy' rows) would render on top of whatever session the user switched to — appearing to 'bleed' into every session. Guard at the staging site: a session may only stage into the view when it is the currently-active session. Background sessions still update their own cache entry; they just never touch $messages. Pure render fix, no behavior change to queuing, interrupt, or drain. * fix(dashboard): authenticate server-spawned PTY child WS with a process-internal credential The embedded-TUI PTY child attaches to two server-internal WebSockets: /api/ws (its primary JSON-RPC gateway backend) and /api/pub (the event sidecar). Both URLs are built server-side in web_server.py and handed to the child via its environment. In OAuth-gated mode (auth_required=true, every hosted Fly agent), _ws_auth_ok unconditionally rejects the legacy ?token=<_SESSION_TOKEN> path — a leaked session token must not grant WS access once the gate is engaged. But _build_gateway_ws_url() still only emitted ?token=, with no gated-mode branch (its sibling _build_sidecar_url had been given a ticket branch; the gateway-url builder was missed). So the TUI child's /api/ws upgrade was rejected 4401 -> 'gateway websocket connection failed' -> 'gateway startup timeout', leaving the embedded chat unusable on every gated deployment. A single-use 30s browser ticket is the wrong shape for this link: the child reads its attach URL once at startup and reuses it on every reconnect, and on a slow cold boot it may not dial within the TTL. (_build_sidecar_url's own docstring already flagged this fragility.) Fix: add a process-lifetime, multi-use internal credential to dashboard_auth.ws_tickets (internal_ws_credential / consume_internal_credential), minted once per process and NEVER injected into the SPA — it only leaves the process via a spawned child's env, so browser-side XSS can't read it, and a leak grants no more than a ticket already does. _ws_auth_ok accepts it via ?internal= in gated mode only. Both _build_gateway_ws_url and _build_sidecar_url now use it, so the child can reconnect both sockets. Loopback / --insecure behavior is unchanged (still ?token=). Needs review: touches _ws_auth_ok + dashboard_auth (core auth surface). * test(dashboard): direct unit coverage for internal WS credential + docstring fix Follow-up to Ben's PR #37892. Adds a TestInternalCredential block to test_dashboard_auth_ws_tickets.py exercising the mint-once stability, multi-use, unminted-rejection, empty-value, wrong-value, reset-and-remint, and ticket-store-independence branches directly (previously only covered indirectly via _ws_auth_ok, which left the unminted and empty-value branches unexercised). Also corrects the consume_internal_credential docstring: the returned identity dict is discarded by the current _ws_auth_ok caller (which only needs the boolean outcome), so the prior 'carry it into its session log' wording over-promised. * test(desktop): real-DOM regression for slash/@ menu keyboard nav The existing slash-menu fix (PR #37937) shipped a unit test that drove the keydown reducer directly. It did not exercise the actual DOM event path — specifically the keyup-driven `refreshTrigger` that was the root cause — so it would not have caught a regression in that path. This adds a faithful @testing-library reproduction that mounts the real `useLiveCompletionAdapter` plus the index.tsx trigger wiring and fires real `keyDown` + `keyUp` event pairs on a contentEditable. It asserts: - ArrowDown cycles through ALL items (0,1,2,3,4,0,1), not just the first two - Escape closes the menu and keyup does not reopen it Reverting the fix (always-refresh keyup + unconditional setTriggerActive(0)) makes this test fail with the highlight stuck at the top — confirming it guards the real bug. * fix(desktop): stop Esc reopening the slash/@ menu; harden keyup guard Follow-up to #37937. That fix guarded the composer's keyup with `shouldSkipTriggerRefreshOnKeyUp(key, trigger !== null)`. The `trigger !== null` check is timing-fragile for Escape: Escape's *keydown* sets `trigger = null` and closes the menu, but in a real browser the *keyup* fires after a re-render, so the handler closure sees `trigger === null`, the guard returns false, `refreshTrigger` runs, re-detects the still-present `/` in the input, and instantly reopens the menu. (jsdom batches state synchronously so a unit test could not observe this -- only the running app does.) Replace the value-based guard with a `triggerKeyConsumedRef` set synchronously in keydown whenever the open popover consumes a nav/control key (Arrow/Enter/Tab/Escape). keyup consults and clears that ref, so it is immune to the keydown->re-render->keyup timing. Applied to both the main composer (chat/composer/index.tsx) and the message-edit composer (assistant-ui/thread.tsx). Removes the now-unused `shouldSkipTriggerRefreshOnKeyUp` helper and its unit test. The real-DOM regression test now fires keydown+keyup pairs through the ref-based handlers and asserts Esc closes and stays closed. Verified by running a production renderer build (Vite v8) under Electron against a local backend: ArrowDown/ArrowUp cycle the full list and Esc dismisses the menu without reopening. * chore: remove committed RELEASE_v*.md changelogs from repo root (#37855) These per-release changelog files are transient working files used only to feed `gh release create --notes-file` at release time; the GitHub Release itself permanently stores the published notes. They were never a build artifact (no package-data glob, no MANIFEST.in include, no CI reference) and don't belong in the tracked tree. - Delete all 15 (v0.2.0 through v0.15.1) - Add RELEASE_v*.md to .gitignore so an accidental `git add -A` can't recommit them The hermes-release skill is updated separately to write the changelog to /tmp/ for the whole release process and never stage it. * fix(windows): rip out unused submodule support in installer & docker & docs we have no submodules anymore, so #37702 was kinda right, but we can just delete it entirely. * fix(docs): remove remaining stale submodule references missed by #38089 (#38105) Follow-up to #38089. The merged PR removed --recurse-submodules from the installer, CI, and getting-started docs, but missed the same stale clause in: - CONTRIBUTING.md (Prerequisites table) - website/docs/developer-guide/contributing.md (table + clone command) - zh-Hans mirror of the developer-guide contributing doc git-lfs is kept in the Git requirement rows since it's a separate, real prerequisite. No .gitmodules has existed since the Atropos RL submodule was removed in #26106. * docs: explain remote-gateway session token for Hermes Desktop (#38144) The desktop Remote gateway field asks for a session token that Hermes never surfaces — by default web_server.py mints an ephemeral token per boot and injects it into the served HTML, so there is nothing in config.yaml, /gateway, or env to copy. Document that you pin it yourself via HERMES_DASHBOARD_SESSION_TOKEN, run the backend with --insecure (keeps the legacy token auth path instead of engaging the OAuth gate), then paste that value into the desktop app. - web-dashboard.md: new 'Connecting Hermes Desktop to a remote backend' section (backend + desktop steps, --insecure vs OAuth-gate nuance, HERMES_DESKTOP_* env override, Tailscale guidance, troubleshooting). - environment-variables.md: new 'Web Dashboard & Hermes Desktop' env-var table (HERMES_DASHBOARD_SESSION_TOKEN, HERMES_DESKTOP_REMOTE_URL/TOKEN, the OAuth and public-url vars) — none were previously documented. * feat(matrix): support bang command aliases * fix(matrix): make bang-command resolution robust + fix dead skill-command branch Follow-up to the salvaged contributor commit: - Underscore→hyphen tolerance now emits a resolvable token. Previously the detect set accepted the hyphenated variant but emit returned the raw token, so '!set_home' produced '/set_home' which the dispatcher could not resolve. Now emits '/set-home'. Aliases are left as-is — the gateway dispatcher canonicalizes them itself. - Fix dead skill-command branch: skill command keys are stored slash-prefixed (e.g. '/arxiv') in get_skill_commands(), but the check compared the bare token, so '!arxiv' never normalized. Now compares the '/candidate' form, making skill aliases (e.g. !gif-search) work. - Re-run bang normalization after Matrix reply-fallback stripping so a quoted reply whose content is a bang command reaches command parity with the slash form. - Replace silent 'except Exception: pass' with logger.debug(exc_info=True). - Add AUTHOR_MAP entry for @nepenth. Tests: +5 (underscore-alias, skill-command branch, quoted-reply bang + slash parity). 162 Matrix tests pass. * docs: add remote-backend section to the Desktop App page (#38180) The Desktop App page covered install, settings, and chat but not how to connect the app to a backend on another machine — the exact thing @PedjaDrazic asked about. Add a 'Connecting to a remote backend' section that explains the Session token is the dashboard token Hermes never surfaces (pin it via HERMES_DASHBOARD_SESSION_TOKEN + run --insecure), and link to the web-dashboard page for the full backend setup rather than duplicating it. Add a reciprocal link from the web-dashboard remote section back to the Desktop App page. * fix(cli): exclude desktop-managed backend from stale-dashboard kill Fixes #37532 * fix(desktop): pass live backend PID to in-app update so its own dashboard is spared The Python half (#37538) reads HERMES_DESKTOP_CHILD_PID to exclude the desktop-managed backend from _kill_stale_dashboard_processes, but nothing set it. applyUpdatesPosixInApp now passes the live backend PID in the `hermes update` env, completing the #37532 fix end-to-end. * chore: add bbednarski9 to AUTHOR_MAP for #29722 salvage (#38189) Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com> * docs: make the Desktop App remote-backend section self-contained (#38194) The section explained why the Session token is hidden but punted the actual setup steps to the web-dashboard page via a link — a bounce for someone on the Desktop App page trying to connect. Inline the concrete steps instead: backend command block (mint token -> .env -> hermes dashboard --insecure), the in-app Remote gateway steps, the env-var override, Tailscale guidance, and a troubleshooting list. Keep a short pointer to the web-dashboard page for the same setup from that angle. * fix(mcp): banner shows 'disabled' not 'failed' for enabled:false servers (#38204) get_mcp_status() treated every non-connected server as a failure, so a server configured with enabled: false rendered as red '— failed' in the startup banner even though it was intentionally off. Add a 'disabled' field derived from the enabled flag and render disabled servers dim as '— disabled' instead. * feat(debug): include desktop.log in hermes debug share / /debug / hermes logs (#38203) The Electron desktop app writes boot failures, backend spawn output, and Python tracebacks to HERMES_HOME/logs/desktop.log, but debug-share only captured agent/errors/gateway — so desktop boot issues never made it into shared debug reports. - logs.py: register desktop -> desktop.log (enables 'hermes logs desktop') - debug.py: capture desktop snapshot, add to summary report, upload full desktop.log in 'share', update privacy notice - gateway /debug inherits the desktop tail via collect_debug_report() - main.py + docs: help text and log-name table (also adds missing gui row) - tests: desktop seed in fixture, new report test, three_pastes -> four_pastes * fix(desktop): add @testing-library/dom as explicit dev dependency @testing-library/react@16 declares @testing-library/dom as a peerDependency and re-exports waitFor/fireEvent/screen/within from it. Without dom installed as a direct dependency, tsc -b fails with TS2305 in every test file that imports those names — which breaks the apps/desktop build during installer bootstrap (Hermes Setup → "INSTALL DIDN'T FINISH"). * chore: regenerate lockfile + map vladkvlchk for salvaged #36978 - Add @testing-library/dom to apps/desktop devDeps in package-lock.json so npm ci validates against the manifest change (contributor left the lockfile out of the PR intentionally). - Removes stale 'peer: true' flags now that dom is an explicit devDep. - AUTHOR_MAP: prostoandrei9@gmail.com -> vladkvlchk (CI author gate). * fix(nix): bump npmDepsHash for refreshed lockfile Lockfile regeneration invalidated the flake's pinned npm-deps hash. Hash taken from fetchNpmDeps' authoritative 'got:' line (the prefetch-npm-deps Diagnose helper reports a different, wrong value due to a fetcherVersion normalization discrepancy). * fix(desktop): stop chat scroll backward-jump from content-growth interim scrolls (#37997) The thread scroll-anchor hook in apps/desktop/src/components/assistant-ui/ thread-virtualizer.tsx was disarming sticky-bottom whenever scrollTop decreased by >1px between scroll events. That check was too eager: when content height grows mid-frame (virtualizer measurement of a newly visible turn, streaming token, Streamdown/Shiki re-tokenization, composer chip toggle), the browser emits an interim 'scroll' event whose scrollTop is smaller than the previous frame's because scrollHeight just jumped. The rAF-scheduled pinToBottom hasn't run yet, so programmaticScrollPendingRef is 0 and the disarm fired. With sticky-bottom disarmed the scroller stuck ~50px above bottom — the visible at-rest backward jump that #37997 describes (and the same root cause as the wheel-up variant in #37527). Fix: - Track scrollHeight per frame (lastHeightRef). Disarm on scrollTop decrease ONLY when scrollHeight did not grow this frame. Real upward user intent (scrollbar drag, keyboard PgUp, programmatic scrollIntoView) still disarms because it moves scrollTop without growing the content. Wheel-up and touchmove continue to disarm via their own listeners. - Stop observing the scroller element itself in the ResizeObserver; only observe its content child. Viewport-only resizes (window resize, devtools panel toggle) no longer trigger spurious pins, matching the intent of the auto-stick-to-bottom behavior. Verified: - apps/desktop `tsc -b` clean. - apps/desktop `vitest run src/components/assistant-ui/streaming.test.tsx` passes (9/9), including the existing wheel-up disarm regression test that asserts scrollTop stays at 420 after a wheel-up + content growth. * fix(desktop): honor upward wheel scroll in long threads * feat(dashboard): check-before-update flow on the System page (#38205) The dashboard's update button ran 'hermes update' immediately with no preview. Now the System page shows whether an update is available and asks the user to confirm before applying it. - New GET /api/hermes/update/check: reports install method, current version, and commits-behind (via banner.check_for_updates, 6h-cached; ?force=1 busts the cache). Soft-fails to behind=null on network error; marks docker/nix/homebrew as can_apply=false with the out-of-band cmd. - System page: update-status badge on the Hermes version row (latest / N behind), a Check-for-updates button, and an Update-now button that opens a ConfirmDialog showing the commit count before POST /api/hermes/ update fires. Cached status loads with the rest of the page. - Docs + 5 endpoint tests (git/up-to-date/docker/soft-failure + auth gate). * fix(tui): stop persisting full tool output in trail lines (silent OOM death) A heavy --tui session (browser snapshots, large tool outputs) silently OOM-killed the Node parent within minutes — closing the gateway child's stdin, which the user saw only as a bare "gateway exited" / stdin EOF. CLI was immune. Root cause: each completed tool's verbose trail line embedded up to 16KB of result_text, persisted in transcript Msg.tools[] for the whole session and rendered EXPANDED by default, so an Ink render-node tree was built for every one of up to 800 messages at once. That tree blew past Node's heap at a few hundred MB — far below the 2.5GB memory-monitor exit threshold, so the death was never even attributed. - text.ts: persisted verbose tool-trail blocks now cap to a small preview (VERBOSE_TRAIL_MAX_CHARS=800/12 lines), not the 16KB live-render budget. Retained trail strings drop ~17x (12.2MB -> 0.7MB at 800 msgs); the live streaming tail still uses the larger LIVE_RENDER budget. - tui_gateway/server.py: lower the gateway-side verbose text cap to match (1KB/16 lines) so we stop shipping output the TUI no longer renders. - memoryMonitor.ts: derive critical/high thresholds from the real V8 heap ceiling (~88%/70%) instead of the hardcoded 2.5GB that killed the process at 31% of an 8GB ceiling; add a one-shot onWarn early-warning on fast sub-threshold heap growth so the next such death is diagnosable, not silent. - entry.tsx: wire onWarn to a crash-log breadcrumb + stderr line. Full tool output is unchanged in the agent context and SQLite session — this is display/transport only, no behavior or context change. Fixes #34095. Related #27282. Tests: ui-tui text + new memoryMonitor suites (33 pass), python verbose-cap guard (5 pass); full ui-tui suite shows no new failures vs pristine main. E2E repro confirms the retention drop. * fix(kanban): don't permanently block tasks that hit a provider rate limit (#38223) A kanban worker that exhausted its retries purely on a provider rate limit / quota wall (e.g. opencode-go's 5-hour window) exited with code 1. The dispatcher counted that as a crash, and with DEFAULT_FAILURE_LIMIT=2 two quota-wall hits permanently blocked the card. Fanning out many workers against one shared quota made this routine. Now a rate-limited worker exits with EX_TEMPFAIL (75); the dispatcher classifies that as a 'rate_limited' exit, releases the task back to 'ready' WITHOUT incrementing consecutive_failures (the breaker can't trip on a transient throttle), and the respawn guard defers the next attempt on a cooldown (default 5min, HERMES_KANBAN_RATE_LIMIT_COOLDOWN_SECONDS) until the quota window clears. Genuine crashes still count and trip the breaker as before. The 120s Retry-After cap is unchanged — no worker parks for hours holding a slot. - conversation_loop.py: surface failure_reason in the exhaustion return - cli.py: kanban worker picks exit 75 on rate_limit/billing failure - kanban_db.py: rate_limited exit kind, no-count requeue, cooldown guard * feat(observability): observer-grade telemetry hooks + NeMo-Relay plugin Adds backend-neutral observer hooks for plugins: session, turn, API request, tool, approval, and subagent lifecycle events with stable correlation IDs (session_id, task_id, turn_id, api_request_id, tool_call_id, parent/child subagent ids). Extends VALID_HOOKS with api_request_error and subagent_start. Hot path is zero-cost when no plugin subscribes: has_hook()/presence checks gate all payload construction, request payloads are returned by reference when no middleware rewrites, and the sanitized response payload no longer embeds raw response objects. Bundles the optional NeMo-Relay observability plugin (plugins/observability/nemo_relay) as an in-repo consumer of the new hooks, peer to the existing langfuse plugin. Fails open when the optional nemo-relay package is not installed. Authored-by: Bryan Bednarski <bbednarski@nvidia.com> Salvaged from #29722 onto current main. * test: restore unrelated trailing newlines in cwd/tool-search tests The salvaged PR incidentally stripped a trailing blank line from two unrelated test files (test_file_tools_cwd_resolution.py, test_tool_search.py). Restore them to keep the salvage diff scoped to the observability feature. * perf(observability): gate tool-hook emit on has_hook; slim per-tool footprint The salvaged observer contract gated the API-request hot path on has_hook() but left the per-tool emit ungated: every tool call ran result-field derivation + payload dict build + invoke_hook dispatch even with zero plugins registered. - _emit_post_tool_call_hook now short-circuits on has_hook("post_tool_call") and derives status/error fields lazily (after the gate, only when a listener will consume them). status defaults to None -> derived; explicit blocked/cancelled callers still pass status through. - transform_tool_result emit (pre-existing hook) likewise gated on has_hook(); skips _tool_result_observer_fields when no listener. - Removed the now-redundant _tool_result_observer_fields pre-computation at the three ok-path call sites (model_tools, agent_runtime_helpers, tool_executor) — the helper derives them, so the no-listener path costs one dict lookup and the call sites shrink. - Tests: stub has_hook=True where payload correctness is asserted; add a no-listener regression proving post_tool_call/transform_tool_result emit is skipped when nothing is registered. * test: stub has_hook in transform_tool_result hook tests CI slice 3 caught that tests/test_transform_tool_result_hook.py monkeypatches invoke_hook but not has_hook, so the new has_hook("transform_tool_result") gate skipped the emit and the transform never ran. Stub has_hook=True in the shared _run_handle_function_call helper whenever a custom invoke_hook is supplied (the test intends hooks to fire). The no-hook-registered test keeps the real has_hook=False path — that's the gate's intended behavior. * fix(doctor): detect + repair stale HERMES_MAX_ITERATIONS .env ghost shadowing config.yaml (#38222) * fix(doctor): detect + repair stale HERMES_MAX_ITERATIONS .env ghost shadowing config.yaml hermes doctor now flags when ~/.hermes/.env carries a HERMES_MAX_ITERATIONS value that disagrees with agent.max_turns in config.yaml, and 'hermes doctor --fix' removes the stale .env line so config.yaml is authoritative. 'hermes config show' surfaces the same drift inline under Max turns. The setup wizard stopped dual-writing this value, but users who edited only config.yaml from a pre-fix install keep a .env ghost. The gateway bridge normally overrides it at startup, but if the bridge bails on any earlier config-parse error the ghost silently wins — config says 400 while the gateway activity line reads N/90. The detector reads the .env FILE directly (load_env), not get_env_value/ os.environ, since the startup bridge may already have overwritten os.environ with the config value. Closes #17534. * fix(config): stop offering HERMES_MAX_ITERATIONS as an editable env var Removes HERMES_MAX_ITERATIONS from OPTIONAL_ENV_VARS so the dashboard env editor (PUT /api/env) and any env-var prompt no longer let a user write it to .env — which would recreate the stale ghost that shadows config.yaml's agent.max_turns (issue #17534). The iteration budget is configured only via config.yaml; the env var stays a read-only backward-compat fallback in the gateway/CLI, never a promoted write target. Regression test asserts it is absent from OPTIONAL_ENV_VARS. * fix(install.ps1): handle dirty worktree on Windows update (#38239) Git for Windows defaults to core.autocrlf=true, which renormalizes the repo's LF-only text files to CRLF in the working tree. On a managed, never-user-edited clone this makes tracked files (.envrc, AGENTS.md, agent/*.py, workflows) show as locally modified, so the update path's bare git checkout aborts with 'Your local changes would be overwritten by checkout' and the desktop bootstrap fails at stage=repository. The bash installer already autostashes before checkout; the PowerShell path had no dirty-tree handling at all and never pinned autocrlf. Fix: (1) git reset --hard HEAD before fetch/checkout in the update path to discard any pre-existing dirt, and (2) pin core.autocrlf=false on both the update and fresh-clone paths so the dirt is never created again. * fix(install): require Node >=20.19/22.12 for the desktop build The "Build desktop app" install step failed with an opaque "exit code 1" on machines with an old Node, and nothing in the logs explained it. Reproduced: on Node 20.5.1, `npm run pack`'s `vite build` crashes with You are using Node.js 20.5.1. Vite requires Node.js version 20.19+ or 22.12+. SyntaxError: The requested module 'node:util' does not provide an export named 'styleText' Vite 8 (rolldown) imports node:util.styleText, which doesn't exist before Node 20.12, so the build dies before producing the app. The installer's check_node / Test-Node accepted ANY pre-existing Node with no version floor, so a too-old system Node was used for the build instead of the bundled Node 22. Add a version floor (^20.19 || >=22.12) to check_node (install.sh) and Test-Node (install.ps1): a too-old system Node is replaced with the Hermes-managed Node 22 LTS, and the desktop stage re-resolves Node so the build always runs on a satisfying version. Declare the same range in apps/desktop/package.json engines. Verified: build succeeds on Node 22, fails on 20.5.1 with the error above; the floor logic matches Vite's range across boundary versions (20.18/20.19, 21.x, 22.11/22.12). * feat(dashboard): enrich profiles dashboard and de-dupe channel env vars (#37872) * feat(desktop): enrich profiles dashboard and de-dupe channel env vars Add active-profile switching, role descriptions (manual + auto-generate via the auxiliary LLM), per-profile model selection, and gateway-running / distribution badges to the GUI Profiles page. New profile creation gains clone-all, optional description and model assignment. Hide messaging-platform credentials (channel_managed) from the Keys/Env page since the Channels page is the canonical surface for them, and relabel the trimmed "messaging" category as "Gateway". Co-authored-by: Cursor <cursoragent@cursor.com> * fix(desktop): address review feedback on profiles/env changes - ProfilesPage: scope the action-menu outside-click handler to the menu's own container via a ref so opening one card's menu no longer leaves others open. - EnvPage: route the "Gateway" label and hint through i18n (t.common.gateway / gatewayHint) instead of hard-coded English, with an English fallback for untranslated locales. - web_server: only report description_auto=true when auto-generation actually succeeded. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(desktop): address second-round review on profiles - ProfilesPage: treat describe-auto success by null-checking the description and trust the response's description_auto flag instead of assuming true; disable the model-editor Save button unless the selected choice resolves to a real /api/model/options entry (avoids silent no-op saves). - tests: cover the new profile endpoints (active get/set + 404, description round-trip + 404, model round-trip + 400 validation, and describe-auto success/failure contracts). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(desktop): more profiles review fixes (toggles, races, tests) - ProfilesPage: use the canonical `active` returned by setActiveProfile; make the SOUL/description/model action-menu items toggle their editor closed when already open; guard description save/auto-describe against stale responses via an activeDescRequest ref so a late reply can't clobber a different open editor. - tests: assert /api/env channel_managed classification matches _channel_managed_env_keys(). Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * fix(tui): save TUI /save snapshots under Hermes home with system prompt (#38251) * fix(tui): save TUI /save snapshots under Hermes home with system prompt The TUI gateway's session.save RPC wrote hermes_conversation_<ts>.json to the workspace/project CWD via os.path.abspath(...) and only exported model and messages. This diverged from the classic CLI /save (which writes under the Hermes profile home) and from the dashboard save (which includes the system prompt). Write the snapshot under get_hermes_home()/sessions/saved/ and include system_prompt, session_id, and session_start so the TUI export matches the CLI and dashboard behavior. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(tui): prefer agent.session_start for /save export; assert it in test Address review feedback: derive session_start from the agent's session_start datetime (matching the classic CLI export) and fall back to the gateway session's created_at only when unavailable. Assert session_start in the regression test. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * fix(desktop): self-update rebuilds and relaunches cleanly on macOS The macOS DMG / in-app update could leave Hermes unable to relaunch: the staged updater rebuilt the desktop without managed Node on PATH ("npm not found"), never installed the rebuilt bundle over the running app, and could race itself on `git stash`. Child install scripts also inherited a deleted cwd from the .app bundle replaced during self-update. - update.rs: prepend $HERMES_HOME/node/bin + venv bin to the rebuild PATH; read --branch / --target-app from args; add a macOS "install" stage that dittos the rebuilt bundle over the target app, clears quarantine, and relaunches via `open` (rolling back on a failed swap); guard start_update with an AtomicBool so concurrent startUpdate() calls can't race git stash. - main.cjs: pass --branch <configured> and --target-app <running bundle> to the staged updater, and spawn it with HERMES_HOME + managed Node/venv on PATH and cwd=HERMES_HOME. - bootstrap.rs: launch the desktop via `open <App>.app` on macOS instead of exec'ing Contents/MacOS/Hermes, avoiding cwd/quarantine issues post-rebuild. - powershell.rs: pin child install scripts to a stable cwd so they don't emit getcwd errors when the launching .app is replaced mid-install. - failure.tsx: in update mode show "Update didn't finish" / "Retry update" and retry via startUpdate() instead of re-running the installer bootstrap. * fix(desktop): dedupe clipboard image paste Chromium exposes the same pasted image on both DataTransfer.items and .files as distinct Blob objects, which attached twice. Prefer items and skip the files mirror when items already yielded images. * fix(installer): stop mislabeling stdout-style progress as stderr Both installers (Electron bootstrap-runner + Tauri) hardcoded a literal `stderr: ` prefix onto every line that arrived on fd 2. Tools like uv/pip/git/npm write normal progress to stderr by design, so routine install output showed up tagged as "stderr" (and rendered red in the Tauri progress UI), making a healthy install look like it was erroring. Carry the stream as structured metadata (`stream: 'stdout' | 'stderr'`) on the log event instead of mangling the line text. The UI now styles stderr subtly (dimmed) rather than alarmingly, and the persistent forensic logs keep their stdout/stderr distinction. * fix(dashboard): clamp PTY resize dimensions for WSL2 winsize garbage (#38200) * fix(dashboard): clamp PTY resize dimensions for WSL2 winsize garbage WSL2 reports columns=131072, rows=1 from a broken winsize probe. The dashboard /chat tab forwards xterm.js dimensions through PtyBridge.resize(), which packs them as unsigned short via struct.pack. 131072 > 65535 raised struct.error — uncaught (only OSError was handled) — breaking the resize path and leaving the TUI laid out for a one-row, absurdly-wide screen, which surfaces as blank/disappearing text. Clamp cols/rows to a sane [1, 2000]x[1, 1000] range before packing. Non-finite/non-integer probes fall back to the minimum so nothing can reach struct.pack and raise. * test(dashboard): de-flake pub/events broadcast test test_pub_broadcasts_to_events_subscribers round-tripped a frame through two nested Starlette TestClient WebSocket portals within a 10s wall-clock budget. Under heavy parallel CI load a starved ASGI thread occasionally blew that budget even though the server logic is correct, producing intermittent 'broadcast not received within 10s' failures. Drive _broadcast_event directly under asyncio with fake subscribers instead. Same fan-out contract (verbatim delivery to every subscriber on the channel, nothing to other channels), zero scheduling surface. Runs in ~0.3s, deterministic across 10 consecutive runs. * fix(gateway): decode schtasks output with locale encoding on Windows _exec_schtasks ran schtasks.exe with text=True but no encoding/errors, so localized Windows (e.g. Chinese) output in the console code page raised UnicodeDecodeError tracebacks from subprocess' reader threads during `hermes gateway status`. Decode with the locale's preferred encoding and errors="replace" so non-UTF-8 status output is read cleanly. Fixes #38172 * test(gateway): cover schtasks locale-safe decoding on Windows Assert _exec_schtasks passes an explicit encoding and errors="replace" to subprocess.run, and that _schtasks_encoding falls back to utf-8 when the locale lookup is empty or raises (#38172). * docs: remote desktop connect needs --tui on the backend (#38350) The Desktop App and Web Dashboard remote-connect instructions told users to start the backend with `hermes dashboard --no-open --insecure --host 0.0.0.0`, omitting --tui. Without --tui the embedded-chat WebSockets (/api/ws, /api/pty) are refused, so the desktop passes the /api/status health check and reports the backend "ready" — but chat never works because the socket is closed on connect. - Add --tui to both backend command blocks (with an inline why-comment). - Explain that the desktop chat runs over /api/ws + /api/pty and needs the embedded-chat surface enabled; a plain dashboard/gateway is not enough. - Add a troubleshooting entry for the exact symptom (connects, says ready, chat dead) on both pages. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(installer): pass LogStream to emit_log calls from #38296 PR #38296 added four emit_log() calls using the old 3-arg signature, but main had already changed emit_log to take a `stream: LogStream` argument (#38312, "stop mislabeling stdout-style progress as stderr"). The two PRs touched different lines, so the merge auto-resolved with no conflict and left main unable to compile the bootstrap installer (E0061: 4 args expected, 3 supplied). Supply the missing stream: Stdout for the update/install progress lines and Stderr for the "could not auto-launch desktop" failure, matching the convention from #38312. cargo check passes. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(desktop): persist pins, reconnect after sleep, dedupe session search Four related desktop session-management bugs: - Pins lost until refresh: pinned sessions are joined against the paginated in-memory session list, so a pinned chat that aged off the most-recent page got evicted on the next refresh (every message.complete triggers one) and the Pinned section went empty. mergeWorkingSessions -> mergeSessionPage now also preserves pinned rows (matched by live id or lineage root). Pin id checks in the chat header, command center, and delete/archive are normalized to the durable sessionPinId so pins survive auto-compression. - Stuck on "Starting Hermes" after sleep: macOS sleep drops the renderer WebSocket; nothing reconnected on wake so the composer stayed disabled. The gateway boot hook now auto-reconnects with backoff on close/error and on wake signals (powerMonitor resume/unlock-screen IPC, window online, visibilitychange). connect() gains an open timeout so a hung reconnect can't deadlock in 'connecting'. Composer placeholder distinguishes "Reconnecting to Hermes" from a cold start. - Loses chats from itself: the same hard-replace that dropped pins also dropped loaded sessions; mergeSessionPage keeps them. - Multiple copies/branches in search: /api/sessions/search deduped only by raw session_id, so compression segments and branches surfaced as separate hits. It now dedupes by lineage root and returns the live compression tip, matching the session_search tool's behavior. * fix(desktop): guard reconnect sockets and keep branch search precise Avoid stale WebSocket events from an old reconnect attempt flipping the gateway state after a newer socket opens. Also limit session-search dedupe to compression edges so branch-specific hits still open the branch instead of collapsing to the parent. * fix(packaging): ship locales/ i18n catalogs in wheel, sdist, and Nix (#38383) * fix(packaging): ship locales/ i18n catalogs in wheel, sdist, and Nix locales/ is a bare data dir (no __init__.py), invisible to packages.find and package-data. Sealed installs (pip wheel, Nix store venv) dropped it, so gateway/CLI commands rendered raw i18n keys like gateway.reset.header_default. - pyproject: [tool.setuptools.data-files] locales = ["locales/*.yaml"] (wheel) - MANIFEST.in: graft locales (sdist) - agent/i18n._locales_dir: env override -> source -> sysconfig data scheme - nix/hermes-agent.nix: copy locales into the store + set HERMES_BUNDLED_LOCALES as defense-in-depth. The wheel's data-files already materialize into the uv2nix venv, so resolution works with no env var; the override pins the store path against a future uv2nix change that could drop data-files. - tests: metadata regression, wheel + sdist build-install smoke tests, and a bundled-locales flake check that verifies BOTH the wrapper override and the env-var-less data-files path. Smoke test wired into CI. Closes #23943, #27632, #35374. Supersedes #23966, #27716, #30261, #33841, #35429, #35494, #35735, #36697. * test: cap locale e2e timeout, tighten catalog count guard The two wheel/sdist e2e tests inherit the global --timeout=30 from addopts; a cold-CI run (isolated build env + venv create + network pip install) can plausibly exceed it. Add @pytest.mark.timeout(300) so they don't ride the unit-test budget and flake intermittently. Also assert the shipped catalog count equals len(SUPPORTED_LANGUAGES) instead of a hardcoded >=16 floor, so the guard self-updates and trips on a single dropped catalog (not just a fully-empty graft). * fix(installer): never brick the install when a self-update swap fails The macOS self-update bundle swap (install_macos_app_update, added in #38296) could leave the user with NO app installed. If moving the existing /Applications/Hermes.app aside failed, the code deleted the running app outright and set moved_old=false; if the subsequent move of the freshly built bundle into place then also failed, the rollback was gated on moved_old (now false) and skipped — leaving the target deleted with no replacement. Extract the swap into swap_in_new_bundle() with a strict invariant: on ANY failure path the target is left pointing at a working bundle (either the original, rolled back, or untouched) and is never deleted with no replacement. Also clean up the staged .hermes-update-new copy on the failure paths instead of orphaning it. Add unit tests covering the happy path, the rollback-on-install-failure path, and the catastrophic both-moves-fail path. The catastrophic-path test was verified to FAIL against the old code ("original app must NOT be deleted on failure") and pass against the fix. * test(installer): cover the post-update relaunch/install target derivation The macOS self-update relaunches and installs over the app it derives via resolve_hermes_desktop_app (.../Hermes.app/Contents/MacOS/Hermes -> .../Hermes.app). That derivation is load-bearing for both the ditto install target and the auto-relaunch (open <app>), but had no test. Add unit coverage: - resolve_hermes_desktop_app_finds_built_bundle: a fake built release tree resolves to the .app bundle on macOS (and the exe elsewhere). - resolve_hermes_desktop_app_is_none_without_a_build: no build => None. Verified the positive test FAILS if the .app parent-walk is wrong (e.g. one too few .parent() hops), so it's a real guard against a regression that would break the post-update relaunch target. cargo test -> 17 passed. * feat(cli): make `hermes portal` the human-readable Portal onboarding alias `hermes portal` (no subcommand) now runs the one-shot Nous Portal onboarding — OAuth login, switch provider to Nous, offer Tool Gateway — identical to `hermes setup --portal` and the human-readable alias for `hermes auth add nous --type oauth` (which still works). The prior status default moves to `hermes portal info`; `status` is kept as a hidden back-compat alias. `open`/`tools` subcommands are unchanged. User-facing hints and docs (status.py, conversation_loop 401 guidance, SystemPage, README, website docs + zh-Hans) now point at `hermes portal` / `hermes portal info`. `--manual-paste` references keep the explicit auth command since `hermes portal` does not expose that flag. * fix(setup): point Portal login-failure retry hints at `hermes portal` The two retry hints inside _run_portal_one_shot (shown when the OAuth login fails) still suggested `hermes auth add nous --type oauth`. Since this path backs both `hermes portal` and `hermes setup --portal`, point users at the new human-readable `hermes portal` for consistency. * fix(desktop): prevent IME Enter from splitting messages and viewport resize from disarming scroll anchor (#38333) * fix(desktop): prevent IME Enter from splitting messages and viewport resize from disarming scroll anchor Two fixes for the Hermes Desktop composer: 1. IME composition Enter was treated as message submission. When a Korean/ Japanese/Chinese IME is composing text and the user presses Enter to finalise the preedit, handleEditorKeyDown fired submitDraft() because it did not check event.nativeEvent.isComposing. The assistant-ui hidden textarea already guards this correctly; the custom contentEditable handler was missing it. Added an early return when isComposing is true. 2. Viewport resize (composer expand/collapse, window resize) was disarming the scroll sticky-bottom anchor. When the composer grows, the thread viewport shrinks, the browser adjusts scrollTop down to keep content visible, and the onScroll handler misread this as a user scroll-up. Added lastClientHeightRef tracking so the disarm condition now requires BOTH stable scrollHeight AND stable clientHeight before treating a scrollTop decrease as user intent. Fixes: random mid-message sends during IME typing; scroll jumps when the composer resizes or the window changes size. * fix(desktop): prevent virtualizer measurement adjustments from fighting scroll anchoring The virtualizer's measureElement callbacks trigger scroll adjustments when item sizes differ from estimates. These fight our ResizeObserver + pinToBottom loop, creating visible rubber-banding (view snaps to composer then jumps back up), even during idle. Three changes: 1. React.memo on VirtualizedThread to stop parent re-renders cascading 2. Shared stickyBottomRef so scrollToFn can check bottom state 3. scrollToFn override: skip adjustments when user is at bottom * fix(desktop): use stable useCallback ref instead of inline arrow for onBranchInNewChat The inline arrow `messageId => void branchInNewChat(messageId)` created a new function reference on every render. This cascaded through: desktop-controller → ChatView → Thread → useMemo([...onBranchInNewChat]) → new messageComponents object → VirtualizedThread receives new prop → React.memo overridden → virtualizer recalculates → measurement adjustments trigger scroll jumps at the 15-second useStatusSnapshot interval. Pass the already-useCallback'd branchInNewChat directly. * fix(desktop): use ctrlEnter submitMode on hidden textarea + gate ResizeObserver on isRunning Two root-cause fixes: 1. IME message splitting: The hidden ComposerPrimitive.Input textarea had submitMode='enter' (default), so any Enter keydown it received — even during IME composition — triggered form.requestSubmit(). Changed to submitMode='ctrlEnter' so only the contentEditable div (which correctly checks isComposing) handles plain-Enter submission. 2. Scroll jumps during idle: The ResizeObserver auto-follow loop was active even when the thread wasn't running, causing spurious pinToBottom calls whenever any layout shift occurred (browser reflow, font load, GPU cache eviction). Gated the ResizeObserver on thread.isRunning so auto-scroll only follows during active streaming. User messages still pin via useLayoutEffect, and thread.runStart still calls jumpToBottom. * fix(desktop): keep chat bottom anchor stable through idle layout shifts * fix(desktop): prevent code block shrink scroll bounce * fix(desktop): release bottom height lock on run completion * fix(desktop): keep streaming code blocks rendered * fix(desktop): keep bottom anchored through final render * fix(desktop): render streaming reasoning code blocks * feat(desktop): add subtle streaming block animations * feat(cli): make `hermes portal` run the full quick-setup Nous flow (model picker) `hermes portal` / `hermes setup --portal` previously logged in and set provider=nous but left the model UNSELECTED (blank -> runtime default) and never showed a picker — unlike the first-time quick setup, which runs the model picker. Route `_run_portal_one_shot` through `_model_flow_nous` — the exact same routine quick setup (`_run_first_time_quick_setup`) and `hermes model` -> Nous use. It handles both the logged-out path (device-code OAuth, which picks a model internally) and the logged-in path (curated Nous model picker), then offers the Tool Gateway opt-in and sets provider=nous. Net effect: `hermes portal` now offers a model picker every time and is a true single-command collapse of quick setup's Nous step. Removes the hand-rolled auth_add_command + manual provider write + separate Tool Gateway prompt (now a single source of truth). Re-syncs the in-memory config from disk afterward so a caller's later save_config can't clobber the model/provider written by the login flow. Docs (CLI help, portal_cli docstrings, nous-portal EN + zh-Hans) updated to mention model selection. New regression test asserts `_run_portal_one_shot` delegates to `_model_flow_nous`. Verified live: `hermes portal` now shows the 27-model curated picker, 'Skip (keep current)' preserves prior provider/model. * fix(cli): harden `hermes portal` SystemExit handling + finish model-pick doc sweep Self-review of #38465 surfaced three real items: 1. SystemExit escape (defense): `_login_nous` raises SystemExit(130)/(1) on cancel/failure. The logged-out login path inside `_model_flow_nous` catches it, but the expired-session re-login path (main.py) only catches Exception, so a Ctrl-C during re-auth could propagate past `_run_portal_one_shot` and kill the CLI. Add SystemExit to the portal handler so all cancel/abort cases end with the graceful 'Setup cancelled / retry later' message. 2. Doc sweep: the model-pick step was only added to the bare-`hermes portal` prose. Propagate it to the surfaces describing `hermes setup --portal` behavior that still omitted model selection: - `--portal` argparse help (main.py) - nous-portal.md intro + the numbered 'what it does' step list (EN + zh-Hans) - run-hermes-with-nous-portal.md 'default model after setup --portal' line, which was now contradictory (there's a picker, not a forced default) (EN + zh) 3. Test coverage: add parametrized regression test asserting the portal handler swallows KeyboardInterrupt / EOFError / SystemExit (returns None, no escape). Note on 'Skip (keep current)': delegating to _model_flow_nous means picking Skip preserves the prior provider instead of force-switching to nous — this is intentional and matches quick setup exactly; docs now say 'sets Nous as your provider (when you pick a model)' rather than unconditionally. * fix(packaging): modernize project.license to PEP 639 SPDX string (#38353) * fix(packaging): modernize project.license to PEP 639 SPDX string Drops the SetuptoolsDeprecationWarning ('project.license as a TOML table is deprecated') emitted on every editable build under setuptools>=77 by switching license = { text = "MIT" } to the SPDX string form plus an explicit license-files entry. Bumps build-system requires to setuptools>=77 so an older build backend can't reject the string form. The warning was non-fatal (builds succeed with it) but surfaces prominently in install.ps1 build-failure output, where it gets mistaken for the cause of unrelated Windows build_editable crashes. * fix(packaging): bound setuptools build requirement per supply-chain policy Add the <83 upper bound to setuptools>=77.0 so the dep-bounds supply-chain gate (>=floor,<next_major) passes. * fix(skills): document xurl X Article ingestion * fix(docker): bake hindsight-client into the image (#38128) (#38530) …

…cstring fix Follow-up to Ben's PR NousResearch#37892. Adds a TestInternalCredential block to test_dashboard_auth_ws_tickets.py exercising the mint-once stability, multi-use, unminted-rejection, empty-value, wrong-value, reset-and-remint, and ticket-store-independence branches directly (previously only covered indirectly via _ws_auth_ok, which left the unminted and empty-value branches unexercised). Also corrects the consume_internal_credential docstring: the returned identity dict is discarded by the current _ws_auth_ok caller (which only needs the boolean outcome), so the prior 'carry it into its session log' wording over-promised.

benbarclay requested a review from teknium1 June 3, 2026 05:26

alt-glitch added type/bug Something isn't working comp/cli CLI entry point, hermes_cli/, setup wizard area/auth Authentication, OAuth, credential pools P2 Medium — degraded but workaround exists labels Jun 3, 2026

kshitijk4poor mentioned this pull request Jun 3, 2026

fix(dashboard): authenticate server-spawned PTY child WS with a process-internal credential (gated mode) #37972

Merged

kshitijk4poor closed this in #37972 Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dashboard): authenticate server-spawned PTY child WS with a process-internal credential (gated mode)#37892

fix(dashboard): authenticate server-spawned PTY child WS with a process-internal credential (gated mode)#37892
benbarclay wants to merge 1 commit into
mainfrom
dashboard-internal-ws-credential

benbarclay commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

benbarclay commented Jun 3, 2026

Problem

Root cause (verified live on staging)

Why not just mint a single-use ticket like the sidecar did?

Fix

Impact case

Verification

Session-token audit (full sweep, per request)

Review note

Uh oh!

github-actions Bot commented Jun 3, 2026

🔎 Lint report: dashboard-internal-ws-credential vs origin/main

ruff

ty (type checker)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🔎 Lint report: `dashboard-internal-ws-credential` vs `origin/main`