Skip to content

fix(desktop): recoverable boot error after prolonged gateway drop (escape hatch)#147

Merged
OmarB97 merged 4 commits into
mainfrom
fix/gateway-boot-test-failures
Jun 10, 2026
Merged

fix(desktop): recoverable boot error after prolonged gateway drop (escape hatch)#147
OmarB97 merged 4 commits into
mainfrom
fix/gateway-boot-test-failures

Conversation

@OmarB97

@OmarB97 OmarB97 commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Why

When the gateway dropped after a healthy boot and every reconnect kept failing (classic case: a remote gateway / tunnel goes away), useGatewayBoot looped the backoff forever with boot.error still null — so the CONNECTING screen covered the whole app with no recovery surface. Two FIX:-prefixed specs in use-gateway-boot.test.tsx documented the intended escape hatch and were red on main (part of the baseline-red masking; hermes-desktop-preexisting-test-failures-20260610).

What changed

  • apps/desktop/src/app/gateway/hooks/use-gateway-boot.ts: new RECONNECT_ESCALATION_ATTEMPTS = 6. In scheduleReconnect, once reconnectAttempt passes 6 (~45s of sustained failure, post-boot only — initial-boot failure has its own path) and no error is set yet, raise a recoverable failDesktopBoot(boot.errors.gatewayUnreachable) so BootFailureOverlay (Use local gateway / Sign in / Retry) replaces the dead-end spinner. The backoff loop keeps running; onState('open') calls completeDesktopBoot() to clear the error and hide the overlay when a reconnect finally succeeds. Below the threshold a drop stays transient (no error) — the quiet sleep/wake reconnect is preserved.
  • i18n: boot.errors.gatewayUnreachable in en/zh/ja/zh-hant + types.

How to review

  1. The three edits in use-gateway-boot.ts: the constant, the raise in scheduleReconnect, the clear in onState('open').
  2. Confirm the raise is guarded on bootCompleted (don't fight the initial-boot error path) and !$desktopBoot.get().error (raise once per episode).

Evidence

  • FIX: after the prolonged drop the hook raises a recoverable boot error and FIX: a successful reconnect clears the recoverable error now pass; the dead-end CONNECTING combo test still passes (sub-threshold stays null).

Verification

  • apps/desktop: tsc -b 0 errors; vitest use-gateway-boot.test.tsx 4/4. Full vitest src: 480 passed / 5 failed — down from 7; the 5 remaining are the other pre-existing failures in untouched files (model-settings, toolset-config, pane-shell, streaming, use-prompt-actions sleep/wake), tracked separately.

Risks / gaps

  • Threshold is fixed at 6 attempts (~45s); not configurable — accepted, matches the test's documented window.
  • The other 4 pre-existing failures are unrelated files, addressed separately under hermes-desktop-preexisting-test-failures-20260610.

Collaborators

  • @OmarB97 (operator)
  • Claude Fable 5 (Claude Code)

…escape hatch)

When the (remote) gateway dropped post-boot and every reconnect kept failing,
useGatewayBoot looped the backoff forever with boot.error=null — so the
CONNECTING screen covered the app with no recovery surface (the dead-end
CONNECTING combo, especially after a remote VPS/tunnel goes away). Two
"FIX:"-prefixed specs in use-gateway-boot.test.tsx asserted the intended
behavior and were red.

Now: once the reconnect backoff passes the 6th attempt (~45s of sustained
failure, post-boot only), raise a RECOVERABLE failDesktopBoot error so
BootFailureOverlay (Use local gateway / Sign in / Retry) becomes reachable.
The backoff keeps running underneath; a later successful reconnect calls
completeDesktopBoot to clear the error and hide the overlay. Below the
threshold the drop is still treated as transient (no error), preserving the
quiet sleep/wake reconnect.

- use-gateway-boot.ts: RECONNECT_ESCALATION_ATTEMPTS=6; raise in
  scheduleReconnect past the threshold; clear in onState 'open'.
- i18n: boot.errors.gatewayUnreachable (en/zh/ja/zh-hant + types).

Fixes the 2 use-gateway-boot escape-hatch tests (now 4/4).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

🔎 Lint report: fix/gateway-boot-test-failures vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 10688 on HEAD, 10688 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 5598 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

…ale since original commit)

The 'uses widthOverride from the store' test set an override on a non-resizable
pane, but trackForPane only applies overrides to resizable panes (that's where
an override originates — drag-resize). Test predates that gating (untouched
since 51c68d4). Mark the pane resizable so it exercises the real path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@OmarB97 OmarB97 merged commit 7491b98 into main Jun 10, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant