Skip to content

fix(desktop): rebind sessions after websocket reconnect (salvage of #41740)#43004

Merged
OutThisLife merged 3 commits into
mainfrom
bb/salvage-41740-ws-session-rebind
Jun 9, 2026
Merged

fix(desktop): rebind sessions after websocket reconnect (salvage of #41740)#43004
OutThisLife merged 3 commits into
mainfrom
bb/salvage-41740-ws-session-rebind

Conversation

@OutThisLife

Copy link
Copy Markdown
Collaborator

Salvages #41740 by @joshuadow and adds a comment-only follow-up. Supersedes #41740.

Problem

When Windows Desktop talks to a WSL-hosted gateway (with or without a reverse proxy), the WebSocket transport can drop while the backend process and its stored sessions stay alive. session.activate returned the parked session without rebinding its transport, so it stayed pointed at _detached_ws_transport and the grace-windowed orphan reaper tore it down. Desktop, meanwhile, trusted its cached runtime id after reconnect, so it kept using a stale id and eventually surfaced session not found until restart.

Fix

  • Backend: session.activate now passes transport=current_transport() or _stdio_transport to _live_session_payload, which rebinds session["transport"]. This is the same reattach session.resume already does, so an activate after reconnect cancels the orphan reap instead of racing it.
  • Desktop: useRouteResume re-runs the resume when the gateway transitions back to open, even if the selected stored session still has a cached runtime id. A seenGatewayStateRef guard keeps the initial mount from being mistaken for a reconnect.

Soundness check

  • Verified _ws_session_is_orphaned keys solely off session["transport"] is _detached_ws_transport, so rebinding a live transport is exactly what clears the orphan state (_schedule_ws_orphan_reap re-checks under the resume lock).
  • _live_session_payload already accepts transport= and rebinds under history_lock; activate now uses it the same way resume does.
  • The route-resume change preserves the existing initial-resume path (pathname-driven) and only adds the reconnect re-resume; the new seenGatewayStateRef prevents a double-resume on mount.
  • The PR's tsconfig.json ES2023 bump is already on main, so the cherry-pick dropped that one redundant file; the substantive changes all landed and the merge cleanly composed with main's stuckOnRoutedSession self-heal.

Commits

  1. fix(desktop): rebind sessions after websocket reconnect@joshuadow's fix, cherry-picked (authorship preserved).
  2. docs(desktop): explain the reconnect-resume guard in use-route-resume — comment-only follow-up documenting why seenGatewayStateRef and the gatewayBecameOpen || arm exist, so they don't get "simplified" back into the bug.

Test plan

  • scripts/run_tests.sh tests/tui_gateway/test_protocol.py — 59 passed
  • apps/desktop npm run type-check — clean
  • apps/desktop npm run test:ui -- --run src/app/session/hooks/use-route-resume.test.tsx — 4 passed
  • npx eslint src/app/session/hooks/use-route-resume.ts — clean
  • Manual (Windows Desktop ↔ WSL gateway): drop the WS transport, confirm the session reattaches instead of going "session not found"

joshuadow and others added 2 commits June 9, 2026 13:18
The reconnect fix turns on two subtle conditions with no inline rationale:
`seenGatewayStateRef` suppresses a spurious "became open" on the first effect
run (so a session mounting with the gateway already open doesn't double-resume),
and the `gatewayBecameOpen ||` arm forces a re-resume even when the route looks
`alreadyActive` because the cached runtime id can be stale after the gateway
rebinds/reaps the session. Comment both so the next reader doesn't "simplify"
them back into the original bug. No behavior change.
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: bb/salvage-41740-ws-session-rebind vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 10614 on HEAD, 10614 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 5561 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/tui Terminal UI (ui-tui/ + tui_gateway/) labels Jun 9, 2026
@liuhao1024

Copy link
Copy Markdown
Contributor

Verification comment — reviewed the diff and found no issues.

  • seenGatewayStateRef correctly prevents the initial mount from being misinterpreted as a "became open" transition — the ref stays false until the first effect run, so a session mounting with gateway already open does not double-resume.
  • On reconnect (gatewayBecameOpen), the hook forces resumeSession even when alreadyActive is true. This is correct: the cached runtime id can be stale after the gateway rebinds/reaps sessions on its side, and trusting it would strand Desktop on a dead session id.
  • Server-side session.activate now passes transport=current_transport() to _live_session_payload, reattaching orphaned WS sessions to the new transport. The protocol test verifies the transport is rebound and the session is no longer orphaned.
  • The salvage description (from fix(desktop): rebind sessions after websocket reconnect #41740) suggests this was a previously attempted fix — the seenGatewayStateRef pattern is a cleaner solution than the original approach.

The check-attribution CI job requires contributor emails from PR commits
to appear in scripts/release.py AUTHOR_MAP.
@OutThisLife OutThisLife merged commit 8d71c38 into main Jun 9, 2026
22 checks passed
@OutThisLife OutThisLife deleted the bb/salvage-41740-ws-session-rebind branch June 9, 2026 19:01
wachoo pushed a commit to wachoo/hermes-agent that referenced this pull request Jun 10, 2026
…ousResearch#41740) (NousResearch#43004)

* fix(desktop): rebind sessions after websocket reconnect

* docs(desktop): explain the reconnect-resume guard in use-route-resume

The reconnect fix turns on two subtle conditions with no inline rationale:
`seenGatewayStateRef` suppresses a spurious "became open" on the first effect
run (so a session mounting with the gateway already open doesn't double-resume),
and the `gatewayBecameOpen ||` arm forces a re-resume even when the route looks
`alreadyActive` because the cached runtime id can be stale after the gateway
rebinds/reaps the session. Comment both so the next reader doesn't "simplify"
them back into the original bug. No behavior change.

---------

Co-authored-by: Josh Dow <josh.dow@prepad.io>
changman pushed a commit to changman/hermes-agent that referenced this pull request Jun 10, 2026
…ousResearch#41740) (NousResearch#43004)

* fix(desktop): rebind sessions after websocket reconnect

* docs(desktop): explain the reconnect-resume guard in use-route-resume

The reconnect fix turns on two subtle conditions with no inline rationale:
`seenGatewayStateRef` suppresses a spurious "became open" on the first effect
run (so a session mounting with the gateway already open doesn't double-resume),
and the `gatewayBecameOpen ||` arm forces a re-resume even when the route looks
`alreadyActive` because the cached runtime id can be stale after the gateway
rebinds/reaps the session. Comment both so the next reader doesn't "simplify"
them back into the original bug. No behavior change.

---------

Co-authored-by: Josh Dow <josh.dow@prepad.io>
alt-glitch pushed a commit that referenced this pull request Jun 14, 2026
…41740) (#43004)

* fix(desktop): rebind sessions after websocket reconnect

* docs(desktop): explain the reconnect-resume guard in use-route-resume

The reconnect fix turns on two subtle conditions with no inline rationale:
`seenGatewayStateRef` suppresses a spurious "became open" on the first effect
run (so a session mounting with the gateway already open doesn't double-resume),
and the `gatewayBecameOpen ||` arm forces a re-resume even when the route looks
`alreadyActive` because the cached runtime id can be stale after the gateway
rebinds/reaps the session. Comment both so the next reader doesn't "simplify"
them back into the original bug. No behavior change.

---------

Co-authored-by: Josh Dow <josh.dow@prepad.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tui Terminal UI (ui-tui/ + tui_gateway/) P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants