Skip to content

Telegram polling stalls indefinitely when proxy TCP connection drops silently #41704

@sunyifei83

Description

@sunyifei83

Summary

When channels.telegram.proxy is configured and the upstream proxy's TCP connection drops silently (no RST/FIN — e.g. NAT timeout, idle connection eviction), the getUpdates long-poll fetch hangs indefinitely. The stall detector fires after ~90 s and forces a polling restart, but this cycle repeats every 20–40 minutes, causing the bot to be unresponsive for ~90–110 s each time.

Environment

  • OpenClaw version: 2026.3.8
  • Node.js version: 22.x
  • Platform: macOS (arm64)
  • Telegram channel: polling mode (no webhook)
  • Proxy: HTTP proxy via channels.telegram.proxy

Observed behavior

Gateway error log shows a recurring pattern:

[telegram] Polling stall detected (no getUpdates for 91.02s); forcing restart.
[telegram] polling runner stopped (polling stall detected); restarting in 2.48s.

Interval between stalls: approximately every 20–40 minutes.
During each stall window the bot receives no Telegram updates and does not respond to messages.

Root cause (code-level)

From the source:

// polling-session.ts
const POLL_STALL_THRESHOLD_MS = 9e4;    // 90 000 ms
const POLL_WATCHDOG_INTERVAL_MS = 3e4;  // 30 000 ms

The grammY runner sends getUpdates with timeout: 30 (Telegram-side long-poll window).
When the proxy silently drops the TCP connection, the underlying undici fetch neither errors nor resolves — it hangs.

channels.telegram.timeoutSeconds maps directly to grammY's ApiClientOptions.timeoutSeconds. If this value is greater than 90 s (the stall threshold), the HTTP-level timeout fires after the stall detector, making it practically unreachable in the proxy-drop scenario.

With the default (timeoutSeconds unset → grammY uses no HTTP timeout), the fetch can hang indefinitely — the stall detector is the only safety net.

Suggested fix

Option A (config-level workaround, minimal change):
Document that users with a proxy should set timeoutSeconds to a value less than 90 s (e.g. 60) so the HTTP client timeout fires before the stall detector, allowing grammY's built-in retry logic to recover cleanly without triggering the stall path.

Option B (code-level, more robust):
Derive an internal fetch AbortSignal timeout from Math.min(timeoutSeconds * 1000, POLL_STALL_THRESHOLD_MS - buffer) and attach it to the getUpdates undici request, ensuring the hanging fetch is always cancelled well before the stall detector would fire — regardless of what the user sets for timeoutSeconds.

Option C (proxy-aware):
When channels.telegram.proxy is set, automatically apply a conservative default for timeoutSeconds (e.g. 60 s) to account for proxy connection instability.

Workaround (confirmed working)

Set in openclaw.json:

"channels": {
  "telegram": {
    "timeoutSeconds": 60
  }
}

With timeoutSeconds: 60 < POLL_STALL_THRESHOLD_MS / 1000 (90), grammY's HTTP client raises a timeout error at 60 s, the runner's exponential-backoff retry kicks in (initial 2 s), and a new getUpdates is issued at ~62 s — well within the 90 s stall window. The stall detector never fires and the bot remains responsive.

Additional notes

  • channels.telegram.network only accepts autoSelectFamily and dnsResultOrder — there is no way to set a fetch-level timeout there currently.
  • The issue is more likely to surface with any stateful proxy or VPN that evicts idle TCP connections, which is a common deployment pattern for bots that need to reach Telegram from restricted networks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions