fix(gateway): bound Yuanbao WS close handshake to avoid 5s shutdown stall#40421
Closed
maxmilian wants to merge 1 commit into
Closed
fix(gateway): bound Yuanbao WS close handshake to avoid 5s shutdown stall#40421maxmilian wants to merge 1 commit into
maxmilian wants to merge 1 commit into
Conversation
ba73467 to
86b2542
Compare
Contributor
Author
|
Note on the reconnect paths: |
…tall The Yuanbao adapter's ConnectionManager._cleanup_ws() awaited ws.close() unbounded. The websockets connection is opened with close_timeout=5, so the close handshake blocks up to 5s waiting for the server's close-frame echo. On an idle gateway shutdown the server never replies, making teardown consistently take ~5s (NousResearch#40383). Bound the close await with asyncio.wait_for(WS_CLOSE_TIMEOUT_S=1.0): a responsive server completes the handshake in well under a second, so this only caps the pathological hang. The reconnect/connect-failure paths that reuse _cleanup_ws() benefit too (each previously could stall up to 5s, now <=1s). A timed-out close is logged at debug to aid future shutdown-hang diagnosis. Adds tests/test_yuanbao_shutdown.py: hung-server bound, fast-path, and close()-raises cases. The hung-server repro went from 5.0s to ~1.0s. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
86b2542 to
ca0e06a
Compare
Contributor
changman
pushed a commit
to changman/hermes-agent
that referenced
this pull request
Jun 10, 2026
… ~5s (NousResearch#40607) Salvaged from NousResearch#40421; re-verified on main, tightened, tested. Co-authored-by: maxmilian <maxmilian@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #40383. The Yuanbao adapter consistently added a fixed ~5s delay to gateway shutdown even when idle. This bounds the WS close handshake so a non-responsive server can no longer stall teardown.
Root cause
ConnectionManager._cleanup_ws()(gateway/platforms/yuanbao.py) awaitedws.close()unbounded. The connection is opened withclose_timeout=5, so thewebsocketsclose handshake blocks up to 5s waiting for the server's close-frame echo. On an idle shutdown the server never replies, so teardown always waited the full ~5s — matching the reporter's✓ yuanbao disconnected (5.01s)log.Call path:
gateway.runtimesadapter.disconnect()→ConnectionManager.close()(cancels heartbeat/recv tasks — fast) →await self._cleanup_ws()→await ws.close()← the entire ~5s lives here.Fix
Bound the close await with
asyncio.wait_for(ws.close(), timeout=WS_CLOSE_TIMEOUT_S)(1.0s, new module constant near the other WS timeouts). A responsive server completes the handshake in well under a second, so this only caps the pathological hang; the graceful close is still attempted on the fast path. The reconnect paths that reuse_cleanup_ws()benefit too.asyncio.TimeoutErroris swallowed alongside the pre-existingexcept Exception(it is a subclass), dropping a dead connection rather than stalling.Tests
New
tests/test_yuanbao_shutdown.py(3 cases):close()raises — ws reference still clearedscripts/run_tests.shover all 6 Yuanbao test files: 220 passed. The new file alone went 5.13s → 1.2s once bounded. ruff +scripts/check-windows-footguns.pyclean on both changed files.Scope
Deliberately unchanged: the
close_timeout=5on the twowebsockets.connect()sites (it governs the library's own handshake cap and is now superseded by the shorter outer bound during teardown); the heartbeat/recv-task cancellation inConnectionManager.close()(already fast). No behavior change for a server that closes promptly.