Skip to content

Telegram cascade on v2026.5.7: polling stall + outbound HTTP agent never rebuilt; multi-account start-account causes event-loop starvation #80695

@vdruts

Description

@vdruts

Summary

On v2026.5.7 (Windows 11, 8 Telegram bot accounts) the gateway entered a Telegram delivery cascade with two distinct failure modes back-to-back:

  1. Pre-restart: 148s getUpdates polling stall, followed by ~50 consecutive sendChatAction "Network request failed" errors firing every ~3s with no backoff. Network was healthy throughout — DNS, TCP, TLS to api.telegram.org all OK from the same host at the same time. The polling-cycle's internal transport rebuild fired (closing stale transport before rebuildrebuilding transport for next polling cycle) but did not repair the sendChatAction HTTP agent — outbound failures continued unabated until a full gateway restart.
  2. Post-restart: During channels.telegram.start-account (8 bots starting in parallel) the gateway hit event-loop starvation (eventLoopDelayMaxMs=8384.4ms, timer delayed 5604ms, likely event-loop starvation), causing multiple getMe calls to time out at 10–15s.

This appears to be a combination of, or interaction between, several already-tracked issues:

The new signal here is that both halves reproduce cleanly on v2026.5.7 — the closed .4.24/.4.25 issues were not, in fact, fully resolved on .5.7; and the inner polling-rebuild explicitly does not heal the outbound HTTP agent.

Environment

  • OpenClaw: webchat v2026.5.7 (per [ws] handshake banner)
  • OS: Windows 11 Pro N (10.0.26200)
  • Node: bundled
  • Plugins active: browser, device-pair, file-transfer, google, memory-core, microsoft, phone-control, talk-voice, telegram
  • Telegram accounts: 8 (alex, tony, alex-demo, cllawway, alan, donald-clawmp, clawd, daniel-clawstley)
  • IPv4-only forced via --require force-ipv4.js (NODE_OPTIONS)

Pre-restart cascade (excerpt)

11:30:22.669 [telegram] Polling stall detected (no completed getUpdates for 148.89s); forcing restart.
              [diag inFlight=0 outcome=ok durationMs=7074 offset=0 apiElapsedMs=2487]
11:30:22.692 [telegram] sendChatAction failed: Network request for 'sendChatAction' failed!
11:30:25.710 [telegram] sendChatAction failed: Network request for 'sendChatAction' failed!
11:30:28.693 [telegram] sendChatAction failed: Network request for 'sendChatAction' failed!
...  (50+ consecutive failures at ~3s cadence)
11:30:37.676 [telegram] Polling runner stop timed out after 15s; forcing restart cycle.
11:30:54.756 [telegram] [diag] closing stale transport before rebuild
11:30:54.759 [telegram] [diag] rebuilding transport for next polling cycle
11:30:55.715 [telegram] sendChatAction failed: Network request for 'sendChatAction' failed!  ← rebuild did NOT fix outbound
... (failures continue past rebuild, only fixed by full gateway restart)

Network was healthy during the cascade

Run from the same host while the failures were still firing:

Test-NetConnection api.telegram.org:443 → True
Resolve-DnsName    api.telegram.org      → 149.154.166.110 (TTL 249)
curl https://api.telegram.org/           → http=302 dns=0.008s connect=0.115s total=0.398s

So the failures were not network-level — strongly suggests a stale persistent HTTP/keep-alive socket inside the sendChatAction HTTP agent that the polling-rebuild path doesn't touch.

Post-restart event-loop starvation

11:33:47.768 [gateway] resolving authentication…
11:35:06.071 [gateway] http server listening (9 plugins; 78.2s)
11:35:16.715 [gateway] startup model warmup timed out after 5000ms; continuing without waiting
11:35:19.942 [telegram] [alex] starting provider (@alexclawdmozibot)
11:35:20.280 [telegram] [tony] starting provider (@TonyClawbinsBot)
11:35:20.283 [telegram] [alex-demo] starting provider (@alexclawdmozidemobot)
11:35:20.285 [telegram] [cllawway] starting provider (@cllawawaybot)
11:35:20.288 [telegram] [alan] starting provider (@alanclawttsbot)
11:35:20.291 [telegram] [donald-clawmp] starting provider (@donaldclawmpbot)
11:35:20.294 [telegram] [clawd] starting provider (@clawdminsterbot)
11:35:20.297 [telegram] [daniel-clawstley] starting provider (@danielclawdslibot)

11:38:10.220 [diagnostic] liveness warning: reasons=event_loop_delay
              eventLoopDelayP99Ms=23.8 eventLoopDelayMaxMs=8384.4
              eventLoopUtilization=0.672 cpuCoreRatio=0.639
              phase=channels.telegram.start-account
11:38:31.430 [fetch-timeout] fetch timeout after 10000ms (elapsed 10682ms) operation=fetchWithTimeout
              url=https://api.telegram.org/bot823795…/getMe
11:38:47.038 [fetch-timeout] fetch timeout after 10000ms (elapsed 15604ms) timer delayed 5604ms,
              likely event-loop starvation operation=fetchWithTimeout
              url=https://api.telegram.org/bot818818…/getMe
11:39:09.345 [fetch-timeout] fetch timeout after 10000ms (elapsed 10133ms) operation=fetchWithTimeout
              url=https://api.telegram.org/bot858735…/getMe

8 bots starting in parallel + first-call agent-model resolution + session-locks recovery + model warmup all bunched into the same tick window. The 5.6s timer delay confirms the loop was actually parked, not just slow.

Reproduction

Run a gateway with ≥8 Telegram accounts on Windows. Leave it idle overnight, then send an inbound message. Most days nothing happens; on a stall day the pre-restart cascade above appears. Restart the gateway and you'll see the post-restart starvation pattern reliably during start-account fanout.

Proposed fixes (for triage)

  1. Outbound HTTP agent rebuild. When polling-rebuild fires, also dispose the outbound (sendChatAction/sendMessage) agent — or share a single Undici/grammy transport instance so a single rebuild fixes both directions. The current code path leaves outbound permanently broken until full restart.
  2. Backoff on sendChatAction failures. Per Bug: Telegram sendChatAction infinite retry loop with no backoff #56096 — the 3s/3s/3s repeat with no jitter or escalation is a hot loop.
  3. Bound multi-account start-account fanout. Throttle to N=2–3 concurrent start-account calls instead of all 8 at once, or move the first getMe off the hot path so model warmup + agent prep don't compound.
  4. Recognize bare grammy error string per Telegram retry regex too strict: bare grammy Network request for 'X' failed! (no "after") never classified as recoverable for context: send, drops outbound messages #80362 so Network request for 'sendChatAction' failed! is classified recoverable and a retry-with-fresh-socket can fire instead of dropping the send.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions