fix(desktop): recover chat after sleep/wake by revalidating a stale remote backend by teknium1 · Pull Request #41350 · NousResearch/hermes-agent

teknium1 · 2026-06-07T15:00:36Z

Summary

Desktop chat recovers on its own after sleep/wake instead of locking on "Starting Hermes…" until a quit+reopen — reimplemented from #40135 on a smaller, more interpretable path (63 added lines vs 555).

Root cause (diagnosed by @AlchemistChaos): a remote/global-remote primary backend has no child process, so the 'exit'/'error' handlers that would clear the main process's cached connectionPromise never fire. Once the remote becomes unreachable across a sleep/wake, the renderer re-dials the same dead descriptor forever.

Changes

electron/main.cjs: new hermes:connection:revalidate IPC. Liveness-probes the cached remote backend's public /api/status (2.5s); on failure drops the cache via resetHermesConnection() (remote-only — no child to SIGTERM) so the next getConnection() rebuilds a reachable descriptor. Local backends are never touched (they self-heal via the child 'exit' handler).
electron/preload.cjs + src/global.d.ts: expose/type revalidateConnection().
src/app/gateway/hooks/use-gateway-boot.ts: the existing backoff-paced reconnect loop calls revalidateConnection() before re-dialing; dismisses the boot-progress overlay on the post-rebuild 'open' so an in-place rebuild can't leave it stuck at ~94%.
scripts/release.py: map co-author email.

Why simpler than #40135

The renderer's reconnect loop already provides retry pacing and rides out transient post-wake blips (exponential backoff, fires only on closed/error). So no failure-streak counter, no episode-window timestamps, no extracted helper module, no module-level state are needed — the loop is the retry mechanism. A transient probe miss just leaves the cache in place; the next backoff tick re-probes. Same recovery behavior, ~9x less code, far easier to reason about.

Validation

	Result
`npm run type-check` (`tsc -b`)	clean
`eslint` (changed files)	clean
`npm run test:desktop:platforms`	102/102 pass

Reimplements #40135. Original diagnosis and fix by @AlchemistChaos; co-authorship preserved on the fix commit.

Infographic

github-actions · 2026-06-07T15:01:27Z

🔎 Lint report: `hermes/hermes-24f2854b` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 10014 on HEAD, 10014 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 5196 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@AlchemistChaos

…emote backend After sleep/wake, a remote (global-remote) primary backend can become unreachable, but it has no child process whose 'exit' clears the main process's cached connectionPromise. The renderer then re-dials the same dead remote forever and the composer stays stuck on "Starting Hermes…"; only a quit+reopen recovered. Fix: the renderer's existing backoff-paced reconnect loop now asks the main process to revalidate the cached connection before re-dialing. The main process liveness-probes the cached REMOTE backend's public /api/status and, if unreachable, drops the cache (resetHermesConnection only nulls connectionPromise for a remote — no child to SIGTERM) so the next getConnection() rebuilds a reachable descriptor. Local backends are never touched here; they self-heal via the child 'exit' handler. The renderer's loop already provides retry pacing and rides out transient blips, so no streak/episode bookkeeping is needed in the main process. The boot hook dismisses the boot-progress overlay on the post-rebuild 'open' so an in-place rebuild can't leave it stuck at ~94%. Reimplements #40135 by @AlchemistChaos on a smaller, more interpretable path (63 added lines vs 555): no extracted helper module, no failure-streak / episode-window state, the renderer's backoff loop is the retry mechanism. Original diagnosis and fix by @AlchemistChaos. Co-authored-by: AlchemistChaos <alchemistchaos@protonmail.com>

alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have labels Jun 7, 2026

teknium1 and others added 2 commits June 7, 2026 10:04

chore(release): map AlchemistChaos co-author email for #40135 salvage

a4d843c

teknium1 force-pushed the hermes/hermes-24f2854b branch from 66c6398 to a4d843c Compare June 7, 2026 17:04

teknium1 merged commit 1c7ae46 into main Jun 8, 2026
23 checks passed

teknium1 deleted the hermes/hermes-24f2854b branch June 8, 2026 00:29

teknium1 mentioned this pull request Jun 8, 2026

fix(desktop): recover chat after sleep/wake by revalidating the cached backend #40135

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(desktop): recover chat after sleep/wake by revalidating a stale remote backend#41350

fix(desktop): recover chat after sleep/wake by revalidating a stale remote backend#41350
teknium1 merged 2 commits into
mainfrom
hermes/hermes-24f2854b

teknium1 commented Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented Jun 7, 2026

Summary

Changes

Why simpler than #40135

Validation

Infographic

Uh oh!

github-actions Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: hermes/hermes-24f2854b vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 7, 2026 •

edited

Loading

🔎 Lint report: `hermes/hermes-24f2854b` vs `origin/main`