Skip to content

fix(qqbot): trigger fatal error + reconnect when max reconnect attempts reached#17814

Open
fengtianyu88 wants to merge 1 commit into
NousResearch:mainfrom
fengtianyu88:fix/adapter-listen-loop-disconnect
Open

fix(qqbot): trigger fatal error + reconnect when max reconnect attempts reached#17814
fengtianyu88 wants to merge 1 commit into
NousResearch:mainfrom
fengtianyu88:fix/adapter-listen-loop-disconnect

Conversation

@fengtianyu88

@fengtianyu88 fengtianyu88 commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

Summary

When the QQ adapters _listen_loop() exhausts all reconnect attempts (MAX_RECONNECT_ATTEMPTS=100), it previously returned silently without notifying the gateway. This left the bot in a zombie state: the gateway believed the platform was connected (is_connected returned True) while the adapter was dead.

Fix

Use the same _set_fatal_error() + _notify_fatal_error() pattern that telegram.py uses. This triggers:

  1. _handle_adapter_fatal_error() in the gateway run loop
  2. Adapter removed from self.adapters
  3. Platform added to _failed_platforms for background reconnection
  4. _platform_reconnect_watcher() picks it up and retries with exponential backoff

Three paths are covered:

  • Rate limit (code 4008) exhausts retries
  • QQCloseError reconnect failures exceed limit
  • General WebSocket exception reconnect failures exceed limit

Root Cause

Analyzed 11 Reconnect failed events in errors.log:

  • All 11 occurred after hermes-gateway restarts
  • Pattern: gateway crashes -> adapter is stopped -> REST API calls fail -> gateway restarted -> adapter reconnects
  • Fix ensures adapter proactively signals fatal error so gateway reconnects automatically without manual restart

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/qqbot QQ Bot adapter labels Apr 30, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #14565 — same fix for #14539 (QQBot adapter not notifying gateway on reconnect exhaustion). PR #15051 also competes.

@alt-glitch

Copy link
Copy Markdown
Collaborator

Competing fix with #14565 for #14539 — same root cause: _listen_loop() returns without notifying gateway after MAX_RECONNECT_ATTEMPTS exhausted.

@alt-glitch alt-glitch added the duplicate This issue or pull request already exists label Apr 30, 2026
…ts reached in _listen_loop

When the QQ adapter's _listen_loop() exhausts all reconnect attempts
(MAX_RECONNECT_ATTEMPTS=100), it previously returned silently without
notifying the gateway. This left the bot in a zombie state: the gateway
believed the platform was connected (is_connected returned True) while
the adapter was dead.

Fix: use the same _set_fatal_error() + _notify_fatal_error() pattern
that telegram.py uses (line 365-366). This triggers:
1. _handle_adapter_fatal_error() in the gateway run loop
2. Adapter removed from self.adapters
3. Platform added to _failed_platforms for background reconnection
4. _platform_reconnect_watcher() picks it up and retries with backoff

Three paths are covered:
- Rate limit (code 4008) exhausts retries
- QQCloseError reconnect failures exceed limit
- General WebSocket exception reconnect failures exceed limit
@fengtianyu88 fengtianyu88 force-pushed the fix/adapter-listen-loop-disconnect branch from b525a9a to cce3827 Compare April 30, 2026 08:30
@fengtianyu88 fengtianyu88 changed the title fix(qqbot): call disconnect() when max reconnect attempts reached fix(qqbot): trigger fatal error + reconnect when max reconnect attempts reached Apr 30, 2026
@fengtianyu88

Copy link
Copy Markdown
Contributor Author

Agreed this is a duplicate in terms of the root cause fix. The key difference is the retryable parameter:

The retryable=True approach matches how telegram.py handles network failures (telegram_network_error at line 365): the bot goes into a reconnect queue, not a process restart. This avoids unnecessary gateway downtime for recoverable network issues.

If the maintainer preference is retryable=False, the fix is functionally equivalent — either approach resolves #14539. Just flag this distinction for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery duplicate This issue or pull request already exists P2 Medium — degraded but workaround exists platform/qqbot QQ Bot adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants