fix(qqbot): add gateway URL cache, retry, and rate-limit handling#18172
Open
cxgreat2014 wants to merge 1 commit into
Open
fix(qqbot): add gateway URL cache, retry, and rate-limit handling#18172cxgreat2014 wants to merge 1 commit into
cxgreat2014 wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Hardens the QQ Bot WebSocket reconnect path by adding gateway URL caching, internal retry, rate-limit detection, and stale HTTP client recovery — preventing the adapter from entering a death loop after transient WebSocket disconnects.
Problem
QQ WebSocket disconnects (especially code 4009 Session timed out) trigger reconnect storms. Each reconnect calls
api.sgroup.qq.com/gatewayvia_get_gateway_url(), which is rate-limited to ~2 calls per time window (confirmed via live API testing). After the first 2 reconnects, all subsequent attempts hit HTTP 400 "接口调用超过频率限制" (frequency limit exceeded).This creates a death loop:
_reconnect()→_get_gateway_url()→ HTTP 400 rate-limitbackoff_idxincrements → retry with backoff → same rate-limited endpoint → same 400MAX_QUICK_DISCONNECT_COUNT),_set_fatal_error()kills the adapterRelated issues: #17703, #14539, #15490
Changes
gateway/platforms/qqbot/constants.py
GATEWAY_URL_RETRY_DELAYS = [0.5, 1.5, 3.0]for bounded internal retryMAX_QUICK_DISCONNECT_COUNTfrom 3 to 6 (reduce false positives during reconnect storms)gateway/platforms/qqbot/adapter.py
Five targeted fixes, all within existing methods:
Gateway URL cache —
_last_gateway_urlcaches the last successfully resolved URL. On subsequent reconnects, cached URL is returned immediately — zero API calls to/gateway.Internal retry —
_get_gateway_url()retries up to 3 times (GATEWAY_URL_RETRY_DELAYS) before giving up. Transient network errors no longer cause immediate failure.Rate-limit detection — If
/gatewayreturns HTTP 400 with "频率限制" in the body, the adapter enters a cooldown (RATE_LIMIT_DELAY) and falls back to the cached URL. If no cache is available, a clear error is raised.Fresh HTTP client —
_ensure_fresh_client()rebuildsself._http_clienton each reconnect, preventing stale connection-pool exceptions that produce emptystr()and unhelpful log messages.Safe error messages —
_safe_str()helper ensures exception messages in logs are never empty, falling back torepr()whenstr()produces an empty string.How to Test
Additionally verified with a standalone mock test suite simulating 6 scenarios:
Type of Change
Notes
Builds on and extends PR #17256 by adding:
The gateway URL cache TTL is intentionally omitted — QQ gateway URLs are long-lived, and a simple
_last_gateway_urlfield is sufficient.