fix(tui-gateway): grant the WS orphan-reap grace in awake time so sleep/wake doesn't kill the Desktop session#44190
Conversation
… in awake time (NousResearch#44183) threading.Timer's wait elapses in wall-clock time on macOS (no pthread_condattr_setclock), so a system sleep longer than the 20s WS-orphan-reap grace made the timer fire at the instant of wake — before Hermes Desktop's reconnect or session.resume could re-bind a transport. Every >20s lid-close therefore 404'd the open chat. Record a time.monotonic() deadline at schedule time (monotonic does not advance while the host sleeps, so it measures awake time) and, when the timer fires early relative to it, re-arm for the remainder instead of reaping. The Desktop now gets the full grace of awake time after wake to reconnect; genuinely orphaned sessions (browser refresh, NousResearch#38591) are still reaped after 20s as before. Fixes NousResearch#44183 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
✅ Verified — WS orphan-reap uses monotonic (awake) time for grace window Reviewed the diff for correctness of the re-arm mechanism and edge cases.
Clean fix for a real macOS sleep/wake regression. No issues found. |
tonydwb
left a comment
There was a problem hiding this comment.
Code Review: PR #44190
Verdict: Approved — well-grounded fix for threading.Timer sleep behavior on macOS.
Summary
Files changed: tui_gateway/server.py (+26, -0), tests/test_tui_gateway_server.py (+104)
Fixes #44183: MacBook sleep causes orphaned WS sessions to be reaped at wake because threading.Timer fires in wall-clock time on macOS. Uses time.monotonic() deadline to re-arm if host slept through the wait.
Assessment
Correctness: The re-arm logic uses a 0.5s slack threshold to distinguish timer jitter from genuine sleep. The _live_transport check in the rearmed timer prevents reaping sessions that reconnected during sleep.
Code quality: Good comments explaining the macOS CLOCK_REALTIME issue. Two comprehensive tests cover the sleep-wake-reap cycle and the reconnect-within-grace scenario.
No issues found.
Reviewed by Hermes Agent
|
Requesting maintainer review — this is ready to land from my side. Standalone fork CI is pending first-run approval here; the rollup branch in #44061 carrying this session's batch is fully green on upstream CI (all test shards, typecheck, e2e). |
Problem
#44183: when a Mac sleeps with Hermes Desktop open, the WS connection drops and the gateway parks the session behind the 20s orphan-reap grace (
HERMES_TUI_WS_ORPHAN_REAP_GRACE_S). After any sleep longer than the grace, the session is reaped at the instant of wake — before the Desktop can reconnect orsession.resume— so actions on the open chat 404 with "session not found".Root cause
threading.Timer's wait elapses in wall-clock time on macOS: CPython's lock wait usespthread_cond_timedwaitwith aCLOCK_REALTIMEdeadline because macOS has nopthread_condattr_setclock. The wall clock keeps running while the host sleeps, so the timer's deadline expires during the sleep and_reapfires immediately at wake — the reap always wins the race against the Desktop's reconnect.time.monotonic()(mach_absolute_timeon macOS,CLOCK_MONOTONICon Linux) does not advance during sleep, which gives a clean way to distinguish "20s of awake time elapsed" from "the host slept through the wait".Fix
_schedule_ws_orphan_reaprecords atime.monotonic()deadline at schedule time. When_reapfires, it first checks the monotonic clock: if more than a small slack (0.5s) of the grace is still unelapsed in awake time, the host slept through the wait — re-arm a timer for the remainder instead of reaping. The Desktop therefore gets the full grace of awake time after wake to re-bind a transport (which cancels the reap exactly as before).remaining <= 0), so behaviour is unchanged.remaining <= 0.Related: #44102 re-arms this same timer while a detached session is mid-turn — independent condition, trivially composable if both land.
Tests
test_ws_orphan_reap_rearms_after_system_sleep— timer fires with only 2s of 20s elapsed on a fake monotonic clock → spared and re-armed for 18s; the re-armed timer then reaps after a full awake remainder.test_ws_orphan_reap_rearm_spares_post_wake_reconnect— Desktop re-binds a live transport within the post-wake remainder → reap is a no-op and the chain stops.pytest tests/test_tui_gateway_server.py: 258 passed, 1 failed —test_browser_manage_connect_default_local_reports_launch_hint, which also fails on cleanmain(85503dc) and is unrelated.Fixes #44183
🤖 Generated with Claude Code