Skip to content

fix(runtime): avoid no-proxy undici fetch side effects#78143

Merged
shakkernerd merged 7 commits into
mainfrom
fix/plugin-fetch-dispatcher
May 6, 2026
Merged

fix(runtime): avoid no-proxy undici fetch side effects#78143
shakkernerd merged 7 commits into
mainfrom
fix/plugin-fetch-dispatcher

Conversation

@shakkernerd

Copy link
Copy Markdown
Member

Summary

  • Problem: external channel plugin startup could import OpenClaw's packaged Undici proxy helpers on a clean no-proxy path, changing process-level fetch behavior before the plugin called globalThis.fetch.
  • Why it matters: openclaw channels login --channel openclaw-weixin failed for @tencent-weixin/openclaw-weixin@2.4.1 with TypeError: fetch failed even though the same request worked in plain Node fetch.
  • What changed: lazy-load Undici dispatcher/proxy dependencies only when OpenClaw actually creates a proxy fetch or proxy dispatcher, while keeping the guarded-fetch timeout bridge and proxy reset behavior intact.
  • What did NOT change (scope boundary): no plugin code is changed, no proxy env contract is removed, and no configured proxy path is intentionally bypassed.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: Tencent Weixin channel login failed before QR startup because the plugin's plain fetch request was rejected after OpenClaw loaded packaged Undici helpers.
  • Real environment tested: source checkout on macOS with isolated HOME, isolated OPENCLAW_STATE_DIR, and @tencent-weixin/openclaw-weixin@2.4.1 installed through openclaw plugins install.
  • Exact steps or command run after this patch:
    1. pnpm openclaw plugins install @tencent-weixin/openclaw-weixin@2.4.1 --pin
    2. pnpm openclaw channels login --channel openclaw-weixin --verbose
  • Evidence after fix:
[plugins] loading openclaw-weixin from /private/tmp/openclaw-78007.Ez0VWR/.openclaw/npm/node_modules/@tencent-weixin/openclaw-weixin/dist/index.js
[plugins] loaded 1 plugin(s) (1 attempted) in 1516.5ms
用手机微信扫描以下二维码,以继续连接:
若二维码未能显示或无法使用,你可以访问以下链接以继续:
https://liteapp.weixin.qq.com/q/7GiQu1?qrcode=3e04d58f044ba6b007fc02cca258bf48&bot_type=3
  • Observed result after fix: the login flow reaches QR startup instead of failing at TypeError: fetch failed.
  • What was not tested: scanning the QR code and completing the Weixin account-linking flow.
  • Before evidence:
Failed to start login: TypeError: fetch failed
Channel login failed: Error: Failed to start login: TypeError: fetch failed

Trace before fix showed the underlying cause:

UND_ERR_INVALID_ARG: invalid content-length header

Root Cause (if applicable)

  • Root cause: src/infra/net/proxy-fetch.ts imported Undici at module load time. That module is exported through plugin runtime helper surfaces, so loading external plugin code could initialize Undici's package dispatcher state even when no proxy was configured. With the current Undici dependency, that changed Node global fetch behavior and caused plugin requests with explicit Content-Length to fail.
  • Missing detection / guardrail: there was no regression test proving clean no-proxy startup does not initialize Undici global dispatcher state.
  • Contributing context (if known): the affected plugin sends an explicit Content-Length header; plain Node fetch accepts the request, but OpenClaw's import side effect made the same request fail.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • src/infra/net/undici-global-dispatcher.test.ts
    • src/infra/net/proxy-fetch.test.ts
    • src/infra/net/fetch-guard.ssrf.test.ts
  • Scenario the test should lock in: no-proxy timeout setup and proxy helper import must not initialize Undici global dispatcher state; proxy fetch and env proxy paths must still lazy-load and use Undici when needed.
  • Why this is the smallest reliable guardrail: the regression was an import-time side effect, so a subprocess assertion against the global dispatcher symbol catches the exact failure class without requiring live third-party auth.
  • Existing test that already covers this (if any): none before this PR.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

External channel plugins that use plain fetch() no longer inherit OpenClaw's packaged Undici dispatcher behavior on clean no-proxy startup. This fixes the Weixin QR login regression in #78007.

Diagram (if applicable)

Before:
plugin load -> SDK/runtime helper import -> proxy-fetch imports Undici -> plugin fetch fails

After:
plugin load -> SDK/runtime helper import -> no Undici import unless proxy fetch is created -> plugin fetch reaches QR login

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS local checkout; Blacksmith Testbox for remote validation
  • Runtime/container: source checkout, Node 24, pnpm 10
  • Model/provider: N/A
  • Integration/channel: @tencent-weixin/openclaw-weixin@2.4.1
  • Relevant config (redacted): isolated state dir with only the installed plugin enabled

Steps

  1. Install @tencent-weixin/openclaw-weixin@2.4.1 into an isolated state dir.
  2. Run pnpm openclaw channels login --channel openclaw-weixin --verbose.
  3. Confirm the command reaches QR login output instead of failing with TypeError: fetch failed.

Expected

  • The Weixin login request reaches QR login startup.

Actual

  • After this patch, the command reaches QR login startup.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What I personally verified:

  • pnpm test src/infra/net/proxy-fetch.test.ts src/infra/net/undici-global-dispatcher.test.ts src/infra/net/fetch-guard.ssrf.test.ts
  • Same targeted test command passed in Blacksmith Testbox.
  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md src/infra/net/proxy-fetch.ts src/infra/net/proxy-fetch.test.ts src/infra/net/undici-runtime.ts src/infra/net/undici-global-dispatcher.ts src/infra/net/undici-global-dispatcher.test.ts src/infra/net/fetch-guard.ssrf.test.ts
  • git diff --check
  • Real isolated openclaw channels login --channel openclaw-weixin --verbose reached QR login output.

Edge cases checked:

  • No-proxy timeout setup does not initialize Undici global dispatcher state.
  • Env proxy setup still installs EnvHttpProxyAgent.
  • Proxy fetch still creates ProxyAgent lazily.
  • FormData normalization still strips stale content-length/content-type in proxy fetch paths.
  • Clearing a proxy dispatcher installed by OpenClaw still restores a direct dispatcher.

What I did not verify:

  • Full Weixin account-linking after QR scan.
  • Full repo test suite.

Note: pnpm check:changed was run in Blacksmith Testbox after rebase and failed in tsgo:core:test on src/agents/model-fallback.test.ts, which this PR does not touch:

src/agents/model-fallback.test.ts(1091,22): error TS2339: Property 'expectedReason' does not exist on type ...
src/agents/model-fallback.test.ts(1093,60): error TS2339: Property 'expectedReason' does not exist on type ...

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: proxy paths could stop loading Undici too early.
    • Mitigation: proxy fetch and env proxy dispatcher tests assert lazy creation still loads and uses the expected Undici agents.
  • Risk: direct guarded fetches could lose timeout propagation.
    • Mitigation: guarded fetch tests still assert the timeout bridge is inherited by direct guarded dispatchers.

@openclaw-barnacle openclaw-barnacle Bot added size: M maintainer Maintainer-authored PR labels May 6, 2026
@clawsweeper

clawsweeper Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Summary
The branch lazy-loads Undici and debug proxy runtime dependencies behind proxy/dispatcher paths, adjusts embedded timeout wiring and tests, adds no-proxy import regression coverage, and records the Weixin plugin fetch fix in the changelog.

Reproducibility: yes. Current-main source inspection shows plugin-facing fetch runtime imports the proxy-fetch and dispatcher modules that statically import Undici, and the PR body provides before/after Weixin login output showing the failure and QR startup after the patch.

Real behavior proof
Sufficient (live_output): The PR body includes copied live before/after terminal output from an isolated Weixin plugin login showing QR startup after the fix.

Next step before merge
No repair lane: this is an active maintainer-labeled PR with sufficient proof, green checks, and no actionable patch defect from this review; normal maintainer merge handling is the next step.

Security
Cleared: The diff defers loading existing runtime dependencies and adds tests/changelog without new third-party sources, secrets handling, permissions, lifecycle hooks, or new network destinations.

Review details

Best possible solution:

Land the lazy-loading fix through normal maintainer review and merge gates so #78007 is closed by the merged implementation while preserving configured proxy and timeout behavior.

Do we have a high-confidence way to reproduce the issue?

Yes. Current-main source inspection shows plugin-facing fetch runtime imports the proxy-fetch and dispatcher modules that statically import Undici, and the PR body provides before/after Weixin login output showing the failure and QR startup after the patch.

Is this the best way to solve the issue?

Yes. Lazy-loading the existing Undici dependencies only when proxy fetch or dispatcher work is requested is the narrow maintainable fix because it avoids external plugin fetch side effects without changing plugin code or the proxy env contract.

What I checked:

  • Current main import side effect: Current main statically imports Undici from proxy-fetch and undici-global-dispatcher; proxy-fetch is re-exported through the plugin SDK fetch runtime, matching the startup side-effect path described by the regression. (src/infra/net/proxy-fetch.ts:1, f531eff6292e)
  • PR lazy-load implementation: At PR head, makeProxyFetch and env-proxy fetch resolve Undici through loadUndiciRuntimeDeps only inside proxy creation paths, and global dispatcher dependencies are loaded only when proxy or explicit dispatcher timeout work needs them. (src/infra/net/proxy-fetch.ts:65, 6ead753157f2)
  • Regression coverage: The PR adds subprocess coverage proving plugin SDK fetch-runtime import does not initialize the Undici global dispatcher symbol on a clean no-proxy startup. (src/plugin-sdk/fetch-runtime.test.ts:7, 6ead753157f2)
  • Real behavior proof: The PR body includes before output with TypeError: fetch failed and after live output from an isolated macOS Weixin plugin login reaching QR startup; the PR also has the proof: sufficient label. (6ead753157f2)
  • GitHub checks: The GitHub check-runs API for head 6ead753 reports successful core check, prod/test type checks, lint, build artifacts, real behavior proof, and network SSRF security boundary checks. (6ead753157f2)
  • Related reports: The PR closes the open stable-regression report Regression: external channel plugin fetch() fails with TypeError: fetch failed in 2026.5.4 #78007 and supersedes the earlier unmerged proxy-scheme attempt Ignore unsupported proxy schemes for env dispatcher #78040 by addressing the import-time Undici dispatcher side effect directly.

Likely related people:

  • steipete: Recent commits touched proxy-fetch, undici global dispatcher behavior, guarded fetch, and plugin SDK fetch runtime seams, making this the strongest routing candidate for the current network runtime surface. (role: recent maintainer; confidence: high; commits: b9c23547ee31, dc859584a352, ecec68d06d19; files: src/infra/net/proxy-fetch.ts, src/infra/net/undici-global-dispatcher.ts, src/infra/net/fetch-guard.ts)
  • Takhoffman: Prior merged work added or refined env proxy bootstrap and proxy capture/runtime behavior in the same network/proxy area touched by this PR. (role: adjacent owner; confidence: medium; commits: 87876a3e36db, 958c34e82cdb; files: src/infra/net/proxy-fetch.ts, src/infra/net/undici-global-dispatcher.ts, src/infra/net/fetch-guard.ts)
  • DranboFieldston: The timeout bridge and guarded dispatcher timeout propagation that this PR preserves trace to the merged timeout propagation work for guarded fetches. (role: introduced related behavior; confidence: medium; commits: 977a4b24afd0; files: src/infra/net/undici-global-dispatcher.ts, src/infra/net/fetch-guard.ts)
  • mcaxtr: Earlier history extracted the shared proxy-fetch utility and added proxy env fallback behavior that this PR now lazy-loads. (role: introduced shared utility; confidence: medium; commits: ba3fa44c5b34, 58cde874365d; files: src/infra/net/proxy-fetch.ts)

Remaining risk / open question:

  • I did not rerun the live Weixin QR login locally; this review relies on the PR's copied live output plus source inspection and GitHub checks.
  • The PR proof stops at QR startup and does not verify full Weixin account-linking after scanning the QR code.

Codex review notes: model gpt-5.5, reasoning high; reviewed against f531eff6292e.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026
@openclaw-barnacle openclaw-barnacle Bot added the agents Agent runtime and tooling label May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR proof: sufficient ClawSweeper judged the real behavior proof convincing. size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression: external channel plugin fetch() fails with TypeError: fetch failed in 2026.5.4

2 participants