Skip to content

fix(tui-gateway): grant the WS orphan-reap grace in awake time so sleep/wake doesn't kill the Desktop session#44190

Open
AIalliAI wants to merge 1 commit into
NousResearch:mainfrom
AIalliAI:fix/44183-ws-orphan-reap-sleep-wake
Open

fix(tui-gateway): grant the WS orphan-reap grace in awake time so sleep/wake doesn't kill the Desktop session#44190
AIalliAI wants to merge 1 commit into
NousResearch:mainfrom
AIalliAI:fix/44183-ws-orphan-reap-sleep-wake

Conversation

@AIalliAI

Copy link
Copy Markdown
Contributor

Problem

#44183: when a Mac sleeps with Hermes Desktop open, the WS connection drops and the gateway parks the session behind the 20s orphan-reap grace (HERMES_TUI_WS_ORPHAN_REAP_GRACE_S). After any sleep longer than the grace, the session is reaped at the instant of wake — before the Desktop can reconnect or session.resume — so actions on the open chat 404 with "session not found".

Root cause

threading.Timer's wait elapses in wall-clock time on macOS: CPython's lock wait uses pthread_cond_timedwait with a CLOCK_REALTIME deadline because macOS has no pthread_condattr_setclock. The wall clock keeps running while the host sleeps, so the timer's deadline expires during the sleep and _reap fires immediately at wake — the reap always wins the race against the Desktop's reconnect.

time.monotonic() (mach_absolute_time on macOS, CLOCK_MONOTONIC on Linux) does not advance during sleep, which gives a clean way to distinguish "20s of awake time elapsed" from "the host slept through the wait".

Fix

_schedule_ws_orphan_reap records a time.monotonic() deadline at schedule time. When _reap fires, it first checks the monotonic clock: if more than a small slack (0.5s) of the grace is still unelapsed in awake time, the host slept through the wait — re-arm a timer for the remainder instead of reaping. The Desktop therefore gets the full grace of awake time after wake to re-bind a transport (which cancels the reap exactly as before).

  • The default grace stays 20s, so genuinely orphaned sessions (browser refresh, the feat(dashboard): always enable embedded chat; remove dashboard --tui flag #38591 leak this reaper exists for) are reaped on the same schedule as today.
  • Repeated sleep cycles just keep re-arming until 20 cumulative seconds of awake time pass without a reconnect.
  • On platforms where the timer wait already pauses during suspend, the monotonic check is a no-op (fires with remaining <= 0), so behaviour is unchanged.
  • The 0.5s slack absorbs timer jitter and forward NTP steps without re-arming spuriously; normal (no-sleep) firings have remaining <= 0.

Related: #44102 re-arms this same timer while a detached session is mid-turn — independent condition, trivially composable if both land.

Tests

  • test_ws_orphan_reap_rearms_after_system_sleep — timer fires with only 2s of 20s elapsed on a fake monotonic clock → spared and re-armed for 18s; the re-armed timer then reaps after a full awake remainder.
  • test_ws_orphan_reap_rearm_spares_post_wake_reconnect — Desktop re-binds a live transport within the post-wake remainder → reap is a no-op and the chain stops.

pytest tests/test_tui_gateway_server.py: 258 passed, 1 failed — test_browser_manage_connect_default_local_reports_launch_hint, which also fails on clean main (85503dc) and is unrelated.

Fixes #44183

🤖 Generated with Claude Code

… in awake time (NousResearch#44183)

threading.Timer's wait elapses in wall-clock time on macOS (no
pthread_condattr_setclock), so a system sleep longer than the 20s
WS-orphan-reap grace made the timer fire at the instant of wake —
before Hermes Desktop's reconnect or session.resume could re-bind a
transport. Every >20s lid-close therefore 404'd the open chat.

Record a time.monotonic() deadline at schedule time (monotonic does not
advance while the host sleeps, so it measures awake time) and, when the
timer fires early relative to it, re-arm for the remainder instead of
reaping. The Desktop now gets the full grace of awake time after wake
to reconnect; genuinely orphaned sessions (browser refresh, NousResearch#38591) are
still reaped after 20s as before.

Fixes NousResearch#44183

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/tui Terminal UI (ui-tui/ + tui_gateway/) labels Jun 11, 2026
@liuhao1024

Copy link
Copy Markdown
Contributor

✅ Verified — WS orphan-reap uses monotonic (awake) time for grace window

Reviewed the diff for correctness of the re-arm mechanism and edge cases.

  • Core fix: threading.Timer on macOS uses wall-clock time (pthread_cond_timedwait with CLOCK_REALTIME), but time.monotonic() uses mach_absolute_time which does not advance during system sleep. The deadline = time.monotonic() + grace captures the awake-time boundary, and the _reap closure checks remaining = deadline - time.monotonic() before proceeding.
  • Re-arm threshold: _WS_ORPHAN_REAP_SLEEP_SLACK_S = 0.5 is large enough to absorb timer jitter and NTP nudges, small relative to any real sleep duration. The remaining > slack guard correctly re-arms only when the host actually slept through the wait.
  • No infinite re-arm loop: Each re-armed timer runs for exactly remaining seconds of awake time. If the host sleeps again, the next timer fires and re-arms for a shorter remainder. The chain terminates because remaining decreases monotonically.
  • Transport check preserved: The re-armed timer still runs _reap() which acquires _session_resume_lock and checks for a live transport — a session that reconnects during the extended grace is spared.
  • Test coverage: Two tests — one for the sleep-through-and-reap sequence, one for reconnect-during-grace spares the session. Both use monotonic clock simulation.

Clean fix for a real macOS sleep/wake regression. No issues found.

@tonydwb tonydwb left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #44190

Verdict: Approved — well-grounded fix for threading.Timer sleep behavior on macOS.

Summary

Files changed: tui_gateway/server.py (+26, -0), tests/test_tui_gateway_server.py (+104)

Fixes #44183: MacBook sleep causes orphaned WS sessions to be reaped at wake because threading.Timer fires in wall-clock time on macOS. Uses time.monotonic() deadline to re-arm if host slept through the wait.

Assessment

Correctness: The re-arm logic uses a 0.5s slack threshold to distinguish timer jitter from genuine sleep. The _live_transport check in the rearmed timer prevents reaping sessions that reconnected during sleep.

Code quality: Good comments explaining the macOS CLOCK_REALTIME issue. Two comprehensive tests cover the sleep-wake-reap cycle and the reconnect-within-grace scenario.

No issues found.


Reviewed by Hermes Agent

@AIalliAI

Copy link
Copy Markdown
Contributor Author

Requesting maintainer review — this is ready to land from my side. Standalone fork CI is pending first-run approval here; the rollup branch in #44061 carrying this session's batch is fully green on upstream CI (all test shards, typecheck, e2e).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tui Terminal UI (ui-tui/ + tui_gateway/) P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: Desktop session lost after sleep/wake — WS orphan reap grace (20s) too short

4 participants