Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
Gateway long-running Node process exhibits multi-subsystem network/timer degradation (model-pricing fetch 60s timeouts, Telegram polling stalls 127–266s, RPC slowdowns 8–83s) reproducible across 2026.4.23, 2026.4.25, and 2026.4.26 on Windows 11 build 26100.8115 + Node 24.14.1. From a standalone Node process on the same machine, fetch() to the same endpoints completes in 100–800ms.
Steps to reproduce
npm i -g openclaw@2026.4.26 --omit=optional
openclaw doctor --fix (gateway auto-restarts; bundled deps installed cleanly)
- Configure Telegram channel:
channels.telegram.enabled=true, valid botToken, dmPolicy: allowlist, plugins.entries.telegram.enabled=true
openclaw gateway start → /health returns 200 within ~30s, log shows "ready (2 plugins: memory-core, telegram)"
- Wait 2–5 minutes
- First
Polling stall detected and pricing fetch failed (timeout 60s) log lines appear
- Cycle recurs every 2–3 minutes thereafter;
getUpdates and sendMessage calls fail with bare Network request for '...' failed!
Expected behavior
Gateway RPC and outbound HTTP fetches complete in <1s consistently, matching the timing observed when the same Node 24.14.1 binary issues fetch() to the same endpoints from a standalone process on the same host:
fetch('https://api.telegram.org/bot<token>/getMe') → 106ms (IPv6-first), 116ms (IPv4-first)
- PowerShell curl to api.telegram.org → 0.1s
- PowerShell curl to openrouter.ai/api/v1/models → 0.8s
Telegram polling and sendMessage should run continuously without 100s+ stalls.
Actual behavior
Multi-subsystem network/timer degradation observed simultaneously inside the long-running gateway process:
gateway/model-pricing | OpenRouter pricing fetch failed (timeout 60s): TimeoutError
gateway/model-pricing | LiteLLM pricing fetch failed (timeout 60s): TimeoutError
gateway/channels/telegram | [telegram] Polling stall detected (active getUpdates stuck for 127.45s); forcing restart.
gateway/channels/telegram | polling cycle finished reason=polling stall detected ... durationMs=127457 error=Network request for 'getUpdates' failed!
gateway/channels/telegram | telegram sendMessage failed: Network request for 'sendMessage' failed!
gateway/channels/telegram | telegram message processing failed: HttpError: Network request for 'sendMessage' failed!
gateway/ws | res ✓ models.list 55798ms (normally <500ms)
gateway/ws | res ✓ models.list 83581ms
gateway/ws | res ✓ doctor.memory.status 35988ms
diagnostic | stuck session: state=processing age=282s queueDepth=1
In a single 1-hour observation window: 6 polling stalls, 4 sendMessage failures, 14 pricing-fetch 60s timeouts, plus multiple models.list / doctor.memory.status / node.list RPCs clocking 8–83s where they normally finish in <500ms.
Direct probes from PowerShell curl https://api.telegram.org/bot<token>/getMe and from a separate node -e "fetch(...)" to the SAME endpoints succeed in 0.1–0.8s consistently throughout these gateway-internal stalls.
OpenClaw version
2026.4.26 (be8c246) — also reproduced on 2026.4.25 (aa36ee6) and 2026.4.23 (a979721)
Operating system
Windows 11 build 26100.8115
Install method
npm global (--omit=optional); Node v24.14.1; PowerShell 5.1
Model
xiaomi/mimo-v2.5-pro (primary); reproduces regardless of model — pricing fetch + Telegram getUpdates stall independent of LLM choice
Provider / routing chain
openclaw -> Telegram polling (bundled grammyjs runner) -> api.telegram.org; openclaw -> xiaomi (mimo via api.xiaomimimo.com); openclaw -> openrouter.ai/api/v1/models + LiteLLM public pricing JSON (gateway-internal hardcoded fetches)
Additional provider/model setup details
No proxy configured (no HTTP_PROXY / HTTPS_PROXY / ALL_PROXY env vars). UK home broadband, no VPN, no corporate firewall. Fallback chain: zai/glm-5.1, xiaomi/mimo-v2.5, minimax/MiniMax-M2.7. All providers reachable when probed from a standalone Node process; degradation is gateway-internal only.
Logs, screenshots, and evidence
**What does NOT explain it (each tested):**
| Hypothesis | Evidence against |
|---|---|
| Bot token / Telegram API issue | `curl https://api.telegram.org/bot<token>/getMe` returns ok=true in 0.1s, consistently |
| Public network slow | Standalone `node -e "fetch(...)"` hits api.telegram.org and openrouter.ai/api/v1/models in 100–800ms |
| IPv6 vs IPv4 | Both `--dns-result-order=ipv4first` and default IPv6-first succeed via standalone Node fetch in <120ms; DNS resolves both A and AAAA cleanly |
| Bundled plugin runtime deps missing | `openclaw doctor --fix` reports all deps installed |
| `fetchWithSsrFGuard` connection pool | Verified in dist/fetch-guard-C10MVwBt.js the SSRF guard creates a per-call dispatcher and disposes on completion. Pricing code (dist/usage-format-ZhKID6__.js) uses raw fetch + AbortSignal.timeout(60000), not SSRF wrapper, and still times out |
| OS-level network state corruption | Full Windows reboot (cold boot to gateway start) reproduces chronic within ~30 minutes |
| 4.25 / 4.26 regression | Identical signatures on 2026.4.23 (a979721) before any 4.25/4.26 install |
| Node 24 specific | Same Node 24 binary fetches fine from a standalone process — only the long-running gateway process degrades |
**Process resource snapshot at degradation point (PID 16776):**
- working set 616 MB / private 811 MB
- 45 threads
- **3337 handles** (notably high)
- 25 min uptime
**Workaround attempts that did NOT help:**
- `openclaw doctor --fix` (3 cycles)
- `openclaw gateway restart` (10+ cycles)
- Hard kill (`Stop-Process -Force` on PID owning :18789 + tray) → clean restart
- Full Windows 11 reboot
- Downgrade 4.25 → 4.23 → back to 4.25 → 4.26
- `channels.telegram.pollTimeoutMs: 5000` (vs default 30000)
- Force IPv4 via `NODE_OPTIONS=--dns-result-order=ipv4first`
- Removed unused providers (arcee/openrouter)
- `openclaw sessions cleanup --enforce --fix-missing`
Logs are sanitized of bot tokens / API keys; happy to share unredacted logs privately.
Impact and severity
Affected: All gateway-internal outbound HTTP — Telegram polling/sendMessage, model-pricing fetch, in-process gateway RPC (models.list, doctor.memory.status, node.list).
Severity: High — Telegram bot replies blocked or delayed 5+ minutes; gateway RPC slow enough that openclaw-sweep tools fail or partial-result. User has to fall back to Tray UI / webchat for any reliable use.
Frequency: Always — chronic recurs every 2–3 minutes once gateway is up >5 min, on this Windows 11 + Node 24.14.1 host across 2026.4.23, 2026.4.25, and 2026.4.26.
Consequence: Telegram channel effectively unusable; missed/delayed messages; gateway needs constant restart; /health flickers between 200 and timeout.
Additional information
Hypotheses (ranked) for maintainers:
- Shared global undici dispatcher / Agent state degrades over time. Multiple subsystems (model-pricing, Telegram grammyjs runner, doctor.memory.status) all use shared global undici and all start failing together. Hand-off / keep-alive socket reaping appears to break — getUpdates requests sit 127–266s past their AbortSignal timeout, suggesting the abort/timer layer is no longer firing as expected.
- Telegram grammyjs polling runner long-poll keep-alive sockets go stale; runner's stall detector only catches it after 127–197s. Plausibly correlates with pricing-fetch / RPC slowdown if all three share the same global dispatcher.
- Event-loop starvation during channels-and-sidecars phase —
models.list 55–83s, node.list 8.9s, doctor.memory.status 35s suggests a long-running synchronous task is blocking the loop, which would also explain pricing-fetch timers not firing.
Note on Codex / parallel diagnostic: An independent agent (Codex) ran a parallel diagnostic on the same machine and concurs the runtime degradation is process-internal, not network-side. openclaw-sweep runId 8924e8d6-d776-4ed5-94be-a87fd194372b available on request.
Last known good version: unknown — bug present in oldest version we could test (2026.4.23). Not a recent regression.
Happy to provide: full gateway log (C:\tmp\openclaw\openclaw-2026-04-2*.log), --inspect profile, OPENCLAW_DEBUG_INGRESS_TIMING=1 / OPENCLAW_DEBUG_HEALTH=1 traces, doctor --deep output (216s runtime; no actionable network-layer findings), or any other diagnostic that helps narrow which layer (undici Agent, grammyjs runner, gateway event loop, Windows-specific socket behavior) is degrading.
Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
Gateway long-running Node process exhibits multi-subsystem network/timer degradation (model-pricing fetch 60s timeouts, Telegram polling stalls 127–266s, RPC slowdowns 8–83s) reproducible across 2026.4.23, 2026.4.25, and 2026.4.26 on Windows 11 build 26100.8115 + Node 24.14.1. From a standalone Node process on the same machine, fetch() to the same endpoints completes in 100–800ms.
Steps to reproduce
npm i -g openclaw@2026.4.26 --omit=optionalopenclaw doctor --fix(gateway auto-restarts; bundled deps installed cleanly)channels.telegram.enabled=true, validbotToken,dmPolicy: allowlist,plugins.entries.telegram.enabled=trueopenclaw gateway start→ /health returns 200 within ~30s, log shows "ready (2 plugins: memory-core, telegram)"Polling stall detectedandpricing fetch failed (timeout 60s)log lines appeargetUpdatesandsendMessagecalls fail with bareNetwork request for '...' failed!Expected behavior
Gateway RPC and outbound HTTP fetches complete in <1s consistently, matching the timing observed when the same Node 24.14.1 binary issues fetch() to the same endpoints from a standalone process on the same host:
fetch('https://api.telegram.org/bot<token>/getMe')→ 106ms (IPv6-first), 116ms (IPv4-first)Telegram polling and sendMessage should run continuously without 100s+ stalls.
Actual behavior
Multi-subsystem network/timer degradation observed simultaneously inside the long-running gateway process:
In a single 1-hour observation window: 6 polling stalls, 4 sendMessage failures, 14 pricing-fetch 60s timeouts, plus multiple
models.list/doctor.memory.status/node.listRPCs clocking 8–83s where they normally finish in <500ms.Direct probes from PowerShell
curl https://api.telegram.org/bot<token>/getMeand from a separatenode -e "fetch(...)"to the SAME endpoints succeed in 0.1–0.8s consistently throughout these gateway-internal stalls.OpenClaw version
2026.4.26 (be8c246) — also reproduced on 2026.4.25 (aa36ee6) and 2026.4.23 (a979721)
Operating system
Windows 11 build 26100.8115
Install method
npm global (--omit=optional); Node v24.14.1; PowerShell 5.1
Model
xiaomi/mimo-v2.5-pro (primary); reproduces regardless of model — pricing fetch + Telegram getUpdates stall independent of LLM choice
Provider / routing chain
openclaw -> Telegram polling (bundled grammyjs runner) -> api.telegram.org; openclaw -> xiaomi (mimo via api.xiaomimimo.com); openclaw -> openrouter.ai/api/v1/models + LiteLLM public pricing JSON (gateway-internal hardcoded fetches)
Additional provider/model setup details
No proxy configured (no HTTP_PROXY / HTTPS_PROXY / ALL_PROXY env vars). UK home broadband, no VPN, no corporate firewall. Fallback chain: zai/glm-5.1, xiaomi/mimo-v2.5, minimax/MiniMax-M2.7. All providers reachable when probed from a standalone Node process; degradation is gateway-internal only.
Logs, screenshots, and evidence
Impact and severity
Affected: All gateway-internal outbound HTTP — Telegram polling/sendMessage, model-pricing fetch, in-process gateway RPC (models.list, doctor.memory.status, node.list).
Severity: High — Telegram bot replies blocked or delayed 5+ minutes; gateway RPC slow enough that openclaw-sweep tools fail or partial-result. User has to fall back to Tray UI / webchat for any reliable use.
Frequency: Always — chronic recurs every 2–3 minutes once gateway is up >5 min, on this Windows 11 + Node 24.14.1 host across 2026.4.23, 2026.4.25, and 2026.4.26.
Consequence: Telegram channel effectively unusable; missed/delayed messages; gateway needs constant restart;
/healthflickers between 200 and timeout.Additional information
Hypotheses (ranked) for maintainers:
models.list 55–83s,node.list 8.9s,doctor.memory.status 35ssuggests a long-running synchronous task is blocking the loop, which would also explain pricing-fetch timers not firing.Note on Codex / parallel diagnostic: An independent agent (Codex) ran a parallel diagnostic on the same machine and concurs the runtime degradation is process-internal, not network-side.
openclaw-sweeprunId8924e8d6-d776-4ed5-94be-a87fd194372bavailable on request.Last known good version: unknown — bug present in oldest version we could test (2026.4.23). Not a recent regression.
Happy to provide: full gateway log (
C:\tmp\openclaw\openclaw-2026-04-2*.log),--inspectprofile,OPENCLAW_DEBUG_INGRESS_TIMING=1/OPENCLAW_DEBUG_HEALTH=1traces,doctor --deepoutput (216s runtime; no actionable network-layer findings), or any other diagnostic that helps narrow which layer (undici Agent, grammyjs runner, gateway event loop, Windows-specific socket behavior) is degrading.