You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Bug]: undici HTTP/2 hang on Windows extends from Telegram polling into the LLM model dispatcher (related to #66885 / #10795 / #4847)
Summary
On Windows running OpenClaw 2026.4.23 and 2026.4.26 with Node 24.13.0, all outbound fetch-based HTTP calls intermittently hang for 90–200 seconds before failing. This affects:
Telegram sendMessage outbound (logged as Network request for 'sendMessage' failed!)
Model dispatcher LLM calls (e.g. openai/claude-opus-4-7) — LLM request timed out after the configured 97s
The third one is new — #66885 only mentions telegram polling and subagent announce, but the same undici socket pool hang is now blocking actual model invocations on the main agent. After we layered every reasonable client-side mitigation, telegram bot-internal commands like /status work (no LLM), but any real agent run on a long prompt times out.
Affected versions
2026.4.26 (be8c246) — first observed today (2026-04-28). Telegram polling stalls every 10–15 min, sendMessage failures, model timeouts.
2026.4.23 (a979721) — same behavior after rolling back. Bug is not version-specific within this range.
Environment
OS: Windows 10.0.26200 (x64)
Node: 24.13.0
OpenClaw user config:agents.defaults.model.primary = openai/claude-opus-4-7, channels.telegram.streaming.mode=partial
Comparable Mac on identical 2026.4.26: zero stalls. Issue is Windows-specific.
Mitigations already applied (none fully resolve)
✅ channels.telegram.streaming.mode=partial, autoSelectFamily=true, dnsResultOrder=ipv4first (set by OpenClaw runtime — see [telegram/network] dnsResultOrder=ipv4first (default-node22) log line)
✅ Add-MpPreference -ExclusionPath for the openclaw npm node_modules path
✅ Add-MpPreference -ExclusionProcess "node.exe"
✅ Inserted set "NODE_OPTIONS=--dns-result-order=ipv4first" into gateway.cmd before the node.exe launch line (process-level, not just runtime hint)
✅ Disable-NetAdapterBinding -ComponentID ms_tcpip6 on the active Ethernet adapter (was already disabled)
✅ Hard reboot of the Windows host to flush stuck undici sockets
✅ Full gateway restart (multiple times)
After all of the above, /status and other bot-internal commands respond instantly. Long prompts to the main agent still time out at 97s on the Anthropic call.
Logs
Telegram polling stall pattern (recurring all afternoon, ~every 10–15 min)
[telegram] Polling stall detected (active getUpdates stuck for 178.44s); forcing restart.
[diag inFlight=1 outcome=started startedAt=1777406174509 finishedAt=1777406174509 durationMs=30356 offset=0]
[telegram][diag] polling cycle finished reason=polling stall detected
error=Network request for 'getUpdates' failed!
Telegram polling runner stopped (polling stall detected); restarting in 3.78s.
[telegram][diag] rebuilding transport for next polling cycle
telegram sendMessage failed: Network request for 'sendMessage' failed!
telegram slash block reply failed: HttpError: Network request for 'sendMessage' failed!
telegram sendMessage failed: Network request for 'sendMessage' failed!
telegram slash final reply failed: HttpError: Network request for 'sendMessage' failed!
(and intermittently, the same path succeeds: telegram sendMessage ok chat=… message=17754 2 seconds later.)
NEW: model dispatcher timeout (this is the part #66885 doesn't cover)
lane task error: lane=session:agent:main:main durationMs=96963 error="FailoverError: LLM request timed out."
lane task error: lane=main durationMs=7477 error="FailoverError: openrouter (openai/gpt-5.5) returned a billing error..."
Embedded agent failed before reply: All models failed (2):
openai/claude-opus-4-7: LLM request timed out. (timeout)
openrouter/openai/gpt-5.5: 402 This request requires more credits…
The 96963 ms duration matches the [default] starting provider … LLM request timed out envelope perfectly — same undici hang shape as the telegram stalls, but on the model call.
Direct API tests bypassing OpenClaw (PowerShell, same machine, same network)
Invoke-RestMethod https://api.telegram.org/bot$bot/getMe → 404 ms ✅
Invoke-RestMethod https://api.telegram.org/bot$bot/getUpdates?… → 399 ms ✅ (returned 2 pending updates)
So api.telegram.org, DNS, TLS, the bot token, and Windows TCP all work. The hang is inside undici's connection pool when the same calls go through Node's built-in fetch.
Suggested root cause (from forum / prior issue references)
Per #66885 and #10795: Node 22+ undici implements Happy Eyeballs but ignores net.setDefaultAutoSelectFamily. When allowH2: true (default) and the host advertises HTTP/2 + IPv6, undici can keep an HTTP/2 stream half-open against an IPv6 path that Windows can't actually route. The dispatcher sits in inFlight until the watchdog kills it.
#66885 fixed this for web_fetch in 4.7 by setting allowH2: false on that dispatcher. The same fix appears not to be applied to:
…or accept a user-supplied dispatcher via env (UNDICI_HTTP1_ONLY=1 or similar) so Windows users without IPv6 routability can opt in without code changes.
Why this matters
The current state on Windows is that:
Bot-internal commands work (no LLM call)
Cron jobs and any prompt to a main agent intermittently time out at the model call layer
The watchdog masks the issue for telegram (eventually retries) but not for model calls (one shot, 97s, fail)
Mac users on identical OpenClaw versions are entirely unaffected because their IPv6 stack is healthy enough to negotiate HTTP/2.
[Bug]: undici HTTP/2 hang on Windows extends from Telegram polling into the LLM model dispatcher (related to #66885 / #10795 / #4847)
Summary
On Windows running OpenClaw 2026.4.23 and 2026.4.26 with Node 24.13.0, all outbound
fetch-based HTTP calls intermittently hang for 90–200 seconds before failing. This affects:getUpdateslong-polling (already noted in [Bug]: Telegram polling stall + subagent announce timeout on Windows (4.12) — undici HTTP/2 root cause #66885)sendMessageoutbound (logged asNetwork request for 'sendMessage' failed!)openai/claude-opus-4-7) —LLM request timed outafter the configured 97sThe third one is new — #66885 only mentions telegram polling and subagent announce, but the same undici socket pool hang is now blocking actual model invocations on the main agent. After we layered every reasonable client-side mitigation, telegram bot-internal commands like
/statuswork (no LLM), but any real agent run on a long prompt times out.Affected versions
2026.4.26 (be8c246)— first observed today (2026-04-28). Telegram polling stalls every 10–15 min, sendMessage failures, model timeouts.2026.4.23 (a979721)— same behavior after rolling back. Bug is not version-specific within this range.Environment
agents.defaults.model.primary = openai/claude-opus-4-7,channels.telegram.streaming.mode=partialDisable-NetAdapterBinding -Name Ethernet -ComponentID ms_tcpip6showsEnabled: False)Mitigations already applied (none fully resolve)
channels.telegram.streaming.mode=partial,autoSelectFamily=true,dnsResultOrder=ipv4first(set by OpenClaw runtime — see[telegram/network] dnsResultOrder=ipv4first (default-node22)log line)Add-MpPreference -ExclusionPathfor the openclaw npm node_modules pathAdd-MpPreference -ExclusionProcess "node.exe"set "NODE_OPTIONS=--dns-result-order=ipv4first"intogateway.cmdbefore the node.exe launch line (process-level, not just runtime hint)Disable-NetAdapterBinding -ComponentID ms_tcpip6on the active Ethernet adapter (was already disabled)After all of the above,
/statusand other bot-internal commands respond instantly. Long prompts to the main agent still time out at 97s on the Anthropic call.Logs
Telegram polling stall pattern (recurring all afternoon, ~every 10–15 min)
Telegram sendMessage failure pattern (slash command replies dropped)
(and intermittently, the same path succeeds:
telegram sendMessage ok chat=… message=177542 seconds later.)NEW: model dispatcher timeout (this is the part #66885 doesn't cover)
The 96963 ms duration matches the
[default] starting provider…LLM request timed outenvelope perfectly — same undici hang shape as the telegram stalls, but on the model call.Direct API tests bypassing OpenClaw (PowerShell, same machine, same network)
So
api.telegram.org, DNS, TLS, the bot token, and Windows TCP all work. The hang is inside undici's connection pool when the same calls go through Node's built-infetch.Suggested root cause (from forum / prior issue references)
Per #66885 and #10795: Node 22+ undici implements Happy Eyeballs but ignores
net.setDefaultAutoSelectFamily. WhenallowH2: true(default) and the host advertises HTTP/2 + IPv6, undici can keep an HTTP/2 stream half-open against an IPv6 path that Windows can't actually route. The dispatcher sits ininFlightuntil the watchdog kills it.#66885 fixed this for
web_fetchin 4.7 by settingallowH2: falseon that dispatcher. The same fix appears not to be applied to:openai/claude-opus-4-7etc.)Suggested fix
Apply the same
allowH2: false(and explicitautoSelectFamily: falsein the underlying Agent) to:getUpdatesandsendMessage)agents/harnessfor provider callsBoth should use a shared undici Agent configured to:
…or accept a user-supplied dispatcher via env (
UNDICI_HTTP1_ONLY=1or similar) so Windows users without IPv6 routability can opt in without code changes.Why this matters
The current state on Windows is that:
Mac users on identical OpenClaw versions are entirely unaffected because their IPv6 stack is healthy enough to negotiate HTTP/2.
Related issues
net.setDefaultAutoSelectFamilycc @steipete — given you tracked #71325 to landing, this may be in your area.