Skip to content

qqbot adapter silently dies on network outage during reconnect; gateway has no task watchdog #15490

@XIXIJCrG

Description

@XIXIJCrG

Summary

When the host's network briefly goes down, the QQ bot platform adapter silently dies during a reconnect attempt. The Gateway parent task does not detect the failure or restart the adapter, so QQ stays offline indefinitely until the container is manually restarted. Telegram, in the same container subjected to the same network event, recovers automatically.

Environment

  • Hermes Agent v0.11.0 (nousresearch/hermes-agent:latest, image sha256 550ae16a17b3)
  • Docker on a NAS (China region), behind a clash HTTP/HTTPS proxy at http://<local-clash-proxy> set via HTTP_PROXY / HTTPS_PROXY env vars
  • Platforms enabled: telegram + qqbot (both reach their endpoints through the same proxy)

What happened (production observation)

  1. Host network started degrading; clash proxy began dropping idle WebSocket connections.
  2. QQ bot adapter lost its WS to wss://api.sgroup.qq.com/websocket every ~60 s. Each cycle the adapter logged WebSocket error: WebSocket closed, reconnected, sent Resume, and succeeded.
  3. After ~5 such cycles, the host network dropped fully for a short window.
  4. The 6th reconnect attempt triggered an exception at the httpx/httpcore layer (TCP/TLS handshake through proxy), which appears not to be caught by the qqbot adapter's reconnect coroutine.
  5. The qqbot task quietly exited — no traceback in agent.log, no ERROR entry, no further qqbot log lines for over an hour.
  6. Meanwhile Telegram experienced the same network event but its retry loop survived and reconnected automatically once network was back.
  7. hermes gateway status continued to report Gateway is running (PID alive) and Telegram kept serving. QQ remained permanently offline until docker restart.

Excerpted log (timestamps UTC)

2026-04-24 23:50:19 WARNING [QQBot:xxx] WebSocket error: WebSocket closed
2026-04-24 23:50:21 INFO    [QQBot:xxx] Reconnected
2026-04-24 23:50:21 INFO    [QQBot:xxx] Session resumed
2026-04-24 23:51:21 WARNING [QQBot:xxx] WebSocket error: WebSocket closed   # exact 60s cycle
2026-04-24 23:51:24 INFO    [QQBot:xxx] Reconnected
... [3 more cycles, all succeeding] ...
2026-04-24 23:54:31 INFO    [QQBot:xxx] Session resumed (seq=232)           # last qqbot log
                                                                             # ~1h 6m of zero qqbot activity at all
2026-04-25 01:00:30 WARNING [Telegram] network error, scheduling reconnect: httpx.ConnectError
... [Telegram retry loop runs to completion and recovers] ...
2026-04-25 01:57:29 INFO    [Telegram] Connected to Telegram (polling mode)
                                                                             # qqbot still silent — no reconnect attempted

Expected behavior

The Gateway should either:

  • Wrap each platform adapter's main loop in a supervisor that restarts the adapter on unhandled exception (or at minimum logs the traceback at ERROR level so silent death is visible), and/or
  • The qqbot adapter's reconnect coroutine should catch transport-layer exceptions (httpx.ConnectError, httpcore.ConnectError, OSError, TLS handshake failures, proxy CONNECT failures) the same way it currently handles WebSocket closed.

Suspected fix locations

  • gateway/platforms/qqbot/adapter.py — broaden the except around the reconnect / Resume path to include httpx.ConnectError, httpcore.ConnectError, OSError, ssl.SSLError, etc.
  • gateway/run.py — add a per-platform task supervisor that restarts a dead adapter task, or at least emits a high-severity log + alert when a platform task exits unexpectedly.

The 60-second WebSocket cycle is likely a clash idle-connection timeout (client-side proxy issue, not your bug), but the silent death after that is the actual bug — a healthy adapter should not be killable by a transient network event.

Happy to provide more logs / the full silent-death window if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/gatewayGateway runner, session dispatch, deliveryplatform/qqbotQQ Bot adaptertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions