[Bug]: undici HTTP/2 hang on Windows extends from Telegram polling into the LLM model dispatcher (related to #66885)

# [Bug]: undici HTTP/2 hang on Windows extends from Telegram polling into the LLM model dispatcher (related to #66885 / #10795 / #4847)

## Summary

On Windows running OpenClaw 2026.4.23 and 2026.4.26 with Node 24.13.0, all outbound `fetch`-based HTTP calls intermittently hang for 90–200 seconds before failing. This affects:

1. Telegram `getUpdates` long-polling (already noted in #66885)
2. Telegram `sendMessage` outbound (logged as `Network request for 'sendMessage' failed!`)
3. **Model dispatcher LLM calls** (e.g. `openai/claude-opus-4-7`) — `LLM request timed out` after the configured 97s

The third one is new — #66885 only mentions telegram polling and subagent announce, but the same undici socket pool hang is now blocking actual model invocations on the main agent. After we layered every reasonable client-side mitigation, telegram bot-internal commands like `/status` work (no LLM), but any real agent run on a long prompt times out.

## Affected versions

- `2026.4.26 (be8c246)` — first observed today (2026-04-28). Telegram polling stalls every 10–15 min, sendMessage failures, model timeouts.
- `2026.4.23 (a979721)` — same behavior after rolling back. Bug is not version-specific within this range.

## Environment

- **OS:** Windows 10.0.26200 (x64)
- **Node:** 24.13.0
- **OpenClaw user config:** `agents.defaults.model.primary = openai/claude-opus-4-7`, `channels.telegram.streaming.mode=partial`
- **Network:** Tailscale + LAN, behind Comcast NAT, IPv6 already disabled at adapter binding (`Disable-NetAdapterBinding -Name Ethernet -ComponentID ms_tcpip6` shows `Enabled: False`)
- **Comparable Mac on identical 2026.4.26:** zero stalls. Issue is Windows-specific.

## Mitigations already applied (none fully resolve)

1. ✅ `channels.telegram.streaming.mode=partial`, `autoSelectFamily=true`, `dnsResultOrder=ipv4first` (set by OpenClaw runtime — see `[telegram/network] dnsResultOrder=ipv4first (default-node22)` log line)
2. ✅ `Add-MpPreference -ExclusionPath` for the openclaw npm node_modules path
3. ✅ `Add-MpPreference -ExclusionProcess "node.exe"`
4. ✅ Inserted `set "NODE_OPTIONS=--dns-result-order=ipv4first"` into `gateway.cmd` before the node.exe launch line (process-level, not just runtime hint)
5. ✅ `Disable-NetAdapterBinding -ComponentID ms_tcpip6` on the active Ethernet adapter (was already disabled)
6. ✅ Hard reboot of the Windows host to flush stuck undici sockets
7. ✅ Full gateway restart (multiple times)

After all of the above, `/status` and other bot-internal commands respond instantly. Long prompts to the main agent still time out at 97s on the Anthropic call.

## Logs

### Telegram polling stall pattern (recurring all afternoon, ~every 10–15 min)

```
[telegram] Polling stall detected (active getUpdates stuck for 178.44s); forcing restart.
   [diag inFlight=1 outcome=started startedAt=1777406174509 finishedAt=1777406174509 durationMs=30356 offset=0]
[telegram][diag] polling cycle finished reason=polling stall detected
   error=Network request for 'getUpdates' failed!
Telegram polling runner stopped (polling stall detected); restarting in 3.78s.
[telegram][diag] rebuilding transport for next polling cycle
```

### Telegram sendMessage failure pattern (slash command replies dropped)

```
telegram sendMessage failed: Network request for 'sendMessage' failed!
telegram slash block reply failed: HttpError: Network request for 'sendMessage' failed!
telegram sendMessage failed: Network request for 'sendMessage' failed!
telegram slash final reply failed: HttpError: Network request for 'sendMessage' failed!
```

(and intermittently, the same path succeeds: `telegram sendMessage ok chat=… message=17754` 2 seconds later.)

### NEW: model dispatcher timeout (this is the part #66885 doesn't cover)

```
lane task error: lane=session:agent:main:main durationMs=96963 error="FailoverError: LLM request timed out."
lane task error: lane=main durationMs=7477 error="FailoverError: openrouter (openai/gpt-5.5) returned a billing error..."
Embedded agent failed before reply: All models failed (2):
   openai/claude-opus-4-7: LLM request timed out. (timeout)
   openrouter/openai/gpt-5.5: 402 This request requires more credits…
```

The 96963 ms duration matches the `[default] starting provider` … `LLM request timed out` envelope perfectly — same undici hang shape as the telegram stalls, but on the model call.

### Direct API tests bypassing OpenClaw (PowerShell, same machine, same network)

```
Invoke-RestMethod  https://api.telegram.org/bot$bot/getMe          → 404 ms ✅
Invoke-RestMethod  https://api.telegram.org/bot$bot/getUpdates?…  → 399 ms ✅ (returned 2 pending updates)
```

So `api.telegram.org`, DNS, TLS, the bot token, and Windows TCP all work. The hang is inside undici's connection pool when the same calls go through Node's built-in `fetch`.

## Suggested root cause (from forum / prior issue references)

Per #66885 and #10795: Node 22+ undici implements Happy Eyeballs but **ignores `net.setDefaultAutoSelectFamily`**. When `allowH2: true` (default) and the host advertises HTTP/2 + IPv6, undici can keep an HTTP/2 stream half-open against an IPv6 path that Windows can't actually route. The dispatcher sits in `inFlight` until the watchdog kills it.

#66885 fixed this for `web_fetch` in 4.7 by setting `allowH2: false` on that dispatcher. The same fix appears **not** to be applied to:

- The Telegram polling/sending dispatcher (4.12 onwards per #66885, still in 4.23/4.26)
- The model dispatcher used for actual LLM requests (`openai/claude-opus-4-7` etc.)

## Suggested fix

Apply the same `allowH2: false` (and explicit `autoSelectFamily: false` in the underlying Agent) to:

1. The Telegram channel's outbound HTTP client (covering both `getUpdates` and `sendMessage`)
2. The model dispatcher used by `agents/harness` for provider calls

Both should use a shared undici Agent configured to:

```js
new Agent({
  allowH2: false,
  connect: { autoSelectFamily: false, family: 4 },
})
```

…or accept a user-supplied dispatcher via env (`UNDICI_HTTP1_ONLY=1` or similar) so Windows users without IPv6 routability can opt in without code changes.

## Why this matters

The current state on Windows is that:
- Bot-internal commands work (no LLM call)
- Cron jobs and any prompt to a main agent intermittently time out at the model call layer
- The watchdog masks the issue for telegram (eventually retries) but not for model calls (one shot, 97s, fail)

Mac users on identical OpenClaw versions are entirely unaffected because their IPv6 stack is healthy enough to negotiate HTTP/2.

## Related issues

- #66885 — Telegram polling stall on Windows (4.12), undici HTTP/2 root cause
- #10795 — Node 22+ undici ignores `net.setDefaultAutoSelectFamily`
- #4847 — Telegram sendMessage fails with 'Network request failed' while curl/standalone Node works
- #25676 — 2026.2.23 outbound Telegram regression (Node 22 undici)
- Blog post: https://blog.juchunko.com/en/openclaw-telegram-ipv6-fix/

cc @steipete — given you tracked #71325 to landing, this may be in your area.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: undici HTTP/2 hang on Windows extends from Telegram polling into the LLM model dispatcher (related to #66885) #73831

[Bug]: undici HTTP/2 hang on Windows extends from Telegram polling into the LLM model dispatcher (related to #66885 / #10795 / #4847)

Summary

Affected versions

Environment

Mitigations already applied (none fully resolve)

Logs

Telegram polling stall pattern (recurring all afternoon, ~every 10–15 min)

Telegram sendMessage failure pattern (slash command replies dropped)

NEW: model dispatcher timeout (this is the part #66885 doesn't cover)

Direct API tests bypassing OpenClaw (PowerShell, same machine, same network)

Suggested root cause (from forum / prior issue references)

Suggested fix

Why this matters

Related issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: undici HTTP/2 hang on Windows extends from Telegram polling into the LLM model dispatcher (related to #66885) #73831

Description

[Bug]: undici HTTP/2 hang on Windows extends from Telegram polling into the LLM model dispatcher (related to #66885 / #10795 / #4847)

Summary

Affected versions

Environment

Mitigations already applied (none fully resolve)

Logs

Telegram polling stall pattern (recurring all afternoon, ~every 10–15 min)

Telegram sendMessage failure pattern (slash command replies dropped)

NEW: model dispatcher timeout (this is the part #66885 doesn't cover)

Direct API tests bypassing OpenClaw (PowerShell, same machine, same network)

Suggested root cause (from forum / prior issue references)

Suggested fix

Why this matters

Related issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions