Summary
When the host's network briefly goes down, the QQ bot platform adapter silently dies during a reconnect attempt. The Gateway parent task does not detect the failure or restart the adapter, so QQ stays offline indefinitely until the container is manually restarted. Telegram, in the same container subjected to the same network event, recovers automatically.
Environment
- Hermes Agent v0.11.0 (
nousresearch/hermes-agent:latest, image sha256 550ae16a17b3)
- Docker on a NAS (China region), behind a clash HTTP/HTTPS proxy at
http://<local-clash-proxy> set via HTTP_PROXY / HTTPS_PROXY env vars
- Platforms enabled:
telegram + qqbot (both reach their endpoints through the same proxy)
What happened (production observation)
- Host network started degrading; clash proxy began dropping idle WebSocket connections.
- QQ bot adapter lost its WS to
wss://api.sgroup.qq.com/websocket every ~60 s. Each cycle the adapter logged WebSocket error: WebSocket closed, reconnected, sent Resume, and succeeded.
- After ~5 such cycles, the host network dropped fully for a short window.
- The 6th reconnect attempt triggered an exception at the httpx/httpcore layer (TCP/TLS handshake through proxy), which appears not to be caught by the qqbot adapter's reconnect coroutine.
- The qqbot task quietly exited — no traceback in
agent.log, no ERROR entry, no further qqbot log lines for over an hour.
- Meanwhile Telegram experienced the same network event but its retry loop survived and reconnected automatically once network was back.
hermes gateway status continued to report Gateway is running (PID alive) and Telegram kept serving. QQ remained permanently offline until docker restart.
Excerpted log (timestamps UTC)
2026-04-24 23:50:19 WARNING [QQBot:xxx] WebSocket error: WebSocket closed
2026-04-24 23:50:21 INFO [QQBot:xxx] Reconnected
2026-04-24 23:50:21 INFO [QQBot:xxx] Session resumed
2026-04-24 23:51:21 WARNING [QQBot:xxx] WebSocket error: WebSocket closed # exact 60s cycle
2026-04-24 23:51:24 INFO [QQBot:xxx] Reconnected
... [3 more cycles, all succeeding] ...
2026-04-24 23:54:31 INFO [QQBot:xxx] Session resumed (seq=232) # last qqbot log
# ~1h 6m of zero qqbot activity at all
2026-04-25 01:00:30 WARNING [Telegram] network error, scheduling reconnect: httpx.ConnectError
... [Telegram retry loop runs to completion and recovers] ...
2026-04-25 01:57:29 INFO [Telegram] Connected to Telegram (polling mode)
# qqbot still silent — no reconnect attempted
Expected behavior
The Gateway should either:
- Wrap each platform adapter's main loop in a supervisor that restarts the adapter on unhandled exception (or at minimum logs the traceback at
ERROR level so silent death is visible), and/or
- The qqbot adapter's reconnect coroutine should catch transport-layer exceptions (
httpx.ConnectError, httpcore.ConnectError, OSError, TLS handshake failures, proxy CONNECT failures) the same way it currently handles WebSocket closed.
Suspected fix locations
gateway/platforms/qqbot/adapter.py — broaden the except around the reconnect / Resume path to include httpx.ConnectError, httpcore.ConnectError, OSError, ssl.SSLError, etc.
gateway/run.py — add a per-platform task supervisor that restarts a dead adapter task, or at least emits a high-severity log + alert when a platform task exits unexpectedly.
The 60-second WebSocket cycle is likely a clash idle-connection timeout (client-side proxy issue, not your bug), but the silent death after that is the actual bug — a healthy adapter should not be killable by a transient network event.
Happy to provide more logs / the full silent-death window if useful.
Summary
When the host's network briefly goes down, the QQ bot platform adapter silently dies during a reconnect attempt. The Gateway parent task does not detect the failure or restart the adapter, so QQ stays offline indefinitely until the container is manually restarted. Telegram, in the same container subjected to the same network event, recovers automatically.
Environment
nousresearch/hermes-agent:latest, image sha256550ae16a17b3)http://<local-clash-proxy>set viaHTTP_PROXY/HTTPS_PROXYenv varstelegram+qqbot(both reach their endpoints through the same proxy)What happened (production observation)
wss://api.sgroup.qq.com/websocketevery ~60 s. Each cycle the adapter loggedWebSocket error: WebSocket closed, reconnected, sent Resume, and succeeded.agent.log, noERRORentry, no furtherqqbotlog lines for over an hour.hermes gateway statuscontinued to reportGateway is running(PID alive) and Telegram kept serving. QQ remained permanently offline untildocker restart.Excerpted log (timestamps UTC)
Expected behavior
The Gateway should either:
ERRORlevel so silent death is visible), and/orhttpx.ConnectError,httpcore.ConnectError,OSError, TLS handshake failures, proxyCONNECTfailures) the same way it currently handlesWebSocket closed.Suspected fix locations
gateway/platforms/qqbot/adapter.py— broaden theexceptaround the reconnect / Resume path to includehttpx.ConnectError,httpcore.ConnectError,OSError,ssl.SSLError, etc.gateway/run.py— add a per-platform task supervisor that restarts a dead adapter task, or at least emits a high-severity log + alert when a platform task exits unexpectedly.The 60-second WebSocket cycle is likely a clash idle-connection timeout (client-side proxy issue, not your bug), but the silent death after that is the actual bug — a healthy adapter should not be killable by a transient network event.
Happy to provide more logs / the full silent-death window if useful.