Skip to content

fix(qqbot): notify gateway via _set_fatal_error when reconnect loop exhausts#14565

Closed
nftpoetrist wants to merge 1 commit into
NousResearch:mainfrom
nftpoetrist:fix/issue-14539-qqbot-reconnect-fatal-error
Closed

fix(qqbot): notify gateway via _set_fatal_error when reconnect loop exhausts#14565
nftpoetrist wants to merge 1 commit into
NousResearch:mainfrom
nftpoetrist:fix/issue-14539-qqbot-reconnect-fatal-error

Conversation

@nftpoetrist

@nftpoetrist nftpoetrist commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

`_listen_loop()` in `gateway/platforms/qqbot/adapter.py` has three paths where `MAX_RECONNECT_ATTEMPTS` is reached and the loop exits via `return` — but none of them call `_set_fatal_error()` or `_notify_fatal_error()`. The gateway runner keeps the platform state as the last `_mark_connected()` value (connected), while the adapter is silently dead. Because the gateway process itself does not exit, `systemd Restart=on-failure` never fires.

Three exhaustion paths are affected:

  • 4008 rate-limit branch — bare `return` with no `logger.error`, no `_set_fatal_error()`, no `_notify_fatal_error()`
  • QQCloseError general branch — has `logger.error` but no `_set_fatal_error()` or `_notify_fatal_error()`
  • Exception branch — has `logger.error` but no `_set_fatal_error()` or `_notify_fatal_error()`

Fix: add `_set_fatal_error("qq_reconnect_exhausted", ..., retryable=False)` followed by `await self._notify_fatal_error()` to all three paths. `_set_fatal_error()` alone only writes to the status file — `_notify_fatal_error()` is required to invoke the GatewayRunner's `_handle_adapter_fatal_error` handler, which disconnects the adapter and triggers `systemd Restart=on-failure`. This matches the pattern already used in the Telegram adapter (`gateway/platforms/telegram.py` lines 366, 460).

Related Issue

Fixes #14539

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor
  • 🎯 New skill

Changes Made

  • `gateway/platforms/qqbot/adapter.py`: add `_set_fatal_error()` + `await _notify_fatal_error()` before each of the three `return` points in `_listen_loop()` (+18 lines)
  • `tests/gateway/test_qqbot.py`: add `TestListenLoopReconnectExhaustion` with one test per exhaustion path, each asserting both `has_fatal_error` and `_notify_fatal_error` was awaited (+100 lines)

How to Test

Requires a live QQ Bot setup. Unit tests cover all three paths:

```bash
pytest tests/gateway/test_qqbot.py::TestListenLoopReconnectExhaustion -v
```

All 3 tests pass.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix
  • I've run `pytest tests/gateway/test_qqbot.py -v` and pre-existing failures are unrelated to this change (they require `pytest-asyncio` which is in dev deps)
  • I've added tests for my changes
  • I've tested on my platform: macOS

Documentation & Housekeeping

  • I've updated relevant documentation — or N/A
  • I've updated `cli-config.yaml.example` — or N/A
  • I've updated `CONTRIBUTING.md` or `AGENTS.md` — or N/A
  • I've considered cross-platform impact — or N/A
  • I've updated tool descriptions/schemas — or N/A

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists platform/qqbot QQ Bot adapter comp/gateway Gateway runner, session dispatch, delivery labels Apr 23, 2026
…l_error + _notify_fatal_error

_listen_loop() had three return points where MAX_RECONNECT_ATTEMPTS was
reached but neither _set_fatal_error() nor _notify_fatal_error() was called.
The gateway runner kept the platform marked as connected while the adapter
was dead, and systemd Restart=on-failure never triggered because the
process did not exit.

Adds _set_fatal_error("qq_reconnect_exhausted", ..., retryable=False) and
await self._notify_fatal_error() to all three exhaustion paths — 4008
rate-limit, QQCloseError, and Exception. _set_fatal_error() alone only
writes to the status file; _notify_fatal_error() is required to invoke the
GatewayRunner's _handle_adapter_fatal_error handler, which disconnects the
adapter and stops the gateway. Matches the pattern in the Telegram adapter
(gateway/platforms/telegram.py:366, 460).

Fixes NousResearch#14539
@nftpoetrist

Copy link
Copy Markdown
Contributor Author

Closing in favour of #19414 which addresses the reconnect loop exhaustion along with four other reconnect bugs. That PR is a more comprehensive fix covering the same root cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/qqbot QQ Bot adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

QQ Bot adapter silently stops reconnecting without notifying gateway

2 participants