fix(sandbox): rewrite #2109 proxy fix as http.request wrapper (signed)#2344
Conversation
PR #2110's axios-only Module._load preload never fired at runtime: 1. nemoclaw-blueprint/scripts/ is excluded from the optimized sandbox build context (src/lib/sandbox-build-context.ts), so axios-proxy-fix.js was not baked into the sandbox image. 2. Adding scripts/ to the build context cache-busts the `COPY nemoclaw-blueprint/` Dockerfile layer and hangs npm ci in the k3s Docker-in-Docker build, so the delivery gap cannot be closed by expanding the context. 3. Even if the file had reached the image, intercepting require('axios') via Module._load cannot patch follow-redirects + proxy-from-env bundled as ESM in OpenClaw's dist/http-Bh-HtMAg.js — there are no require() calls to intercept. The Bot Connector reply path uses the bundled code. Replace with an http.request() wrapper — the lowest common denominator every HTTP library bottoms out at. Detect FORWARD-mode requests (hostname = proxy IP, path = full https:// URL) and rewrite them to https.request() against the real target, letting NODE_USE_ENV_PROXY handle the CONNECT tunnel correctly. Works for any HTTP client, including bundled ESM that makes no require() calls. Delivery: - nemoclaw-blueprint/scripts/http-proxy-fix.js — canonical source for review and tests. - scripts/nemoclaw-start.sh embeds the same JS inline via a heredoc, writes it to /tmp/nemoclaw-http-proxy-fix.js through emit_sandbox_sourced_file (root:root 444, symlink-safe), and loads it via NODE_OPTIONS=--require. No changes to sandbox-build-context. - test/http-proxy-fix-sync.test.ts enforces byte-for-byte equality between the heredoc and the canonical file, so future edits cannot silently diverge. - validate_tmp_permissions is invoked with the new path on both the root and non-root boot paths (the fix JS is a trust-boundary file — tampering would inject arbitrary code into every Node process via NODE_OPTIONS). Because the content ships inside nemoclaw-start.sh rather than as a separately-deployed file, the fix fires on the very first sandbox boot with no post-onboard deploy + restart dance required. Verified end-to-end on 2026-04-23: EC2 t3.large (ca-central-1), NemoClaw v0.0.22 + OpenShell v0.0.29, Node 22.22.1. Direct axios.get('https://clawhub.ai') returns 200 inside the sandbox; full Teams -> ALB -> OpenClaw -> LiteLLM/Bedrock -> Bot Connector -> Teams round-trip succeeds. No `FORWARD rejected` entries in OpenShell network logs. Comparison table and reproduction steps posted in the PR description. Scope: - Fixes the #2109 regression class (axios / follow-redirects / proxy-from-env FORWARD-mode rewrites on NODE_USE_ENV_PROXY=1). - Does NOT fix #1570 (Discord WebSocket via the ws library). That bug sits at a different layer — EnvHttpProxyAgent's FORWARD-vs- CONNECT decision for Upgrade: websocket requests — and needs the agent-swap treatment that #2296 applies. The http.request wrapper in this PR cannot safely handle that case (it would re-enter the same faulty agent logic). - Does NOT modify sandbox-build-context.ts. Removes the superseded nemoclaw-blueprint/scripts/axios-proxy-fix.js and updates the existing regression tests in service-env.test.ts to the new variable name (_PROXY_FIX_SCRIPT). Closes #2109.
Wrap the `new URL(options.path)` call in a try/catch so that a
malformed path value (which passes the `startsWith('https://')` check
but still fails URL parsing) falls back to the original http.request
instead of crashing the Node process.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address CodeRabbit review feedback on PR.
The FORWARD-mode branch previously constructed a fresh options object
with only {method, hostname, host, port, path, protocol, headers,
timeout}, silently dropping caller-supplied fields that can matter for
correctness:
- signal — AbortController, used by modern axios/fetch for
cancellation. Dropping it meant user-initiated aborts would not
propagate to the rewritten https.request, leaving the request
running after the caller thought it was cancelled.
- TLS: ca, cert, key, passphrase, rejectUnauthorized — custom trust
anchors or mTLS settings. Uncommon in the FORWARD path but not
impossible.
- auth — Basic-auth credentials for the target origin.
- lookup, family, localAddress, maxHeaderSize, insecureHTTPParser —
per-request network/parser tuning.
Switch to Object.assign({}, options, { ...proxy-routing-fields })
which clones the caller's options and overwrites only the fields we
explicitly need to change (method default, hostname/host/port/path/
protocol). Everything else is carried over verbatim.
Mirror the edit in the inline heredoc in scripts/nemoclaw-start.sh so
the canonical file and the embedded copy remain byte-identical; the
http-proxy-fix-sync test enforces this.
Address CodeRabbit review feedback (minor, PR #2323). Both `catch (e)` bindings in http-proxy-fix.js are unused — one in the proxy URL parse (falls through silently) and one in the FORWARD-path new URL guard (returns the original request). Rename to `catch (_e)` to satisfy the project's "unused variables must be prefixed with _" convention. Mirror the edit in the inline heredoc in scripts/nemoclaw-start.sh so the canonical file and the embedded copy remain byte-identical; the http-proxy-fix-sync test enforces this.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (5)
💤 Files with no reviewable changes (1)
📝 WalkthroughWalkthroughReplaces an axios-specific proxy preload script with a more general HTTP proxy fix that intercepts http.request() calls in FORWARD-mode proxy scenarios (http request to proxy host with Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsTimed out fetching pipeline failures after 30000ms Comment |
Follow-up to #2344 (NemoClaw 0.0.24). The http.request → https.request rewrite that resolves the NODE_USE_ENV_PROXY=1 / library-side proxy double-processing was shallow-copying the caller's options into the rewritten https.request. That dragged three classes of forward-proxy- hop fields onto a request that now goes direct to the upstream: * options.agent — a forward-proxy http.Agent. http.Agent cannot speak TLS, so https.request either ignores it and falls back to a default agent or fails the TLS handshake outright depending on caller. This is the most likely root cause of the deepinfra-style report on Discord ("LLM request failed: network connection error" against a custom OpenAI-compatible upstream while NVIDIA-routed providers keep working). NVIDIA Endpoints' DNS-rewritten path through OpenShell doesn't end up in this branch, so the bug stayed invisible. * options.auth — basic-auth meant for the proxy hop. Leaving it on https.request would Basic-auth the *target* server with proxy credentials. * Host / Proxy-Authorization / Proxy-Connection / Proxy-Authenticate headers — Host points at the proxy host and the Proxy-* headers are hop-by-hop. Stripping them lets Node regenerate Host from the target's hostname/port and prevents proxy creds from leaking upstream. Adds test/http-proxy-fix-rewrite.test.ts with seven cases pinning each strip and confirming signal/timeout/TLS fields and target headers (Authorization, Content-Type, …) survive. The existing http-proxy-fix-sync.test.ts byte-for-byte enforcer still passes — the canonical wrapper at nemoclaw-blueprint/scripts/http-proxy-fix.js and the heredoc in scripts/nemoclaw-start.sh were updated together. Reverts the openclaw `request.allowPrivateNetwork` Dockerfile change that this PR previously carried — empirical investigation showed openclaw 2026.4.9 rejects that key in strict zod and `openclaw doctor --fix` (which the Dockerfile runs at build time) silently strips it back to `"request": {}` before the image ships. The upstream plumbing that reads `request.allowPrivateNetwork` only landed in 2026.4.10 (commit 0808dd111c, openclaw PR #63671). The change here was a no-op, diagnosing the wrong layer; this commit replaces it with the actual fix at the wrapper layer. The label-rename + arithmetic-via-`--json` test improvements added earlier on this branch (Phase 4c, TC-SBX-02 rewrite, skill-verify hardening, sandbox_exec stderr-merge fix) are kept — they catch regressions independent of the proxy bug. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…est (#2490) ## Summary Follow-up to #2344 (NemoClaw 0.0.24). The `http.request` → `https.request` rewrite in `nemoclaw-blueprint/scripts/http-proxy-fix.js` was shallow-copying the caller's options into the rewritten `https.request` call. That dragged forward-proxy-hop fields onto a request that now goes direct to the upstream. This is consistent with the Discord report from `mflova` after the 0.0.23 → 0.0.24 bump: > Finally, when I run the bot, I always get an error after sending any message to it: `error=LLM request failed: network connection error. rawError=Connection error` > > Some context: I am using 0.0.23 as the 0.0.24 had a bug with the network. mflova was on a custom OpenAI-compatible endpoint (deepinfra). NVIDIA Endpoints' DNS-rewritten path through OpenShell doesn't end up in the wrapper's FORWARD-mode branch, which is why upstream regressions there stayed invisible to nightly-e2e. mflova's reported error is the openclaw client's wrapped surface; on Node 22 the underlying mechanism is a synchronous `TypeError: Protocol "https:" not supported. Expected "http:"` thrown by `https.request` when the forward-proxy `http.Agent` reaches it. On Node 18/20 the same root cause manifests as a TLS handshake failure rather than a synchronous throw — same bug class, different surface. ## Bisect proof (committed in this PR) `test/http-proxy-fix-rewrite.test.ts` carries both halves of the proof in-repo: - The "control" `describe` block reproduces the *broken* pre-fix rewrite shape inline (`Object.assign` with no field strips, hop-by-hop headers preserved) and asserts `https.request` rejects it with `TypeError: Protocol "https:" not supported`. That test passes regardless of the wrapper, locking in the bug class. - The rewrite `describe` block asserts the wrapper produces options that *don't* match that shape — `agent`, `auth`, `Host`, `Proxy-*`, the rest of RFC 7230 §6.1 hop-by-hop, `servername`, `checkServerIdentity`, `socketPath`, `localAddress`, `lookup`, `family`, `hints` are all stripped on rewrite. Combined: bug class is real and detectable; wrapper's job is to never produce that shape. If a future maintainer reverts a strip, the rewrite test breaks; if Node changes the failure surface, the control test breaks. Two-sided coverage without storing a copy of the broken wrapper. ## What was wrong The rewrite was: ```js var rewritten = Object.assign({}, options, { hostname: target.hostname, host: target.hostname, port: target.port || 443, path: target.pathname + target.search, protocol: 'https:', }); return https.request(rewritten, callback); ``` Three classes of forward-proxy-hop fields rode along: - **`options.agent`** — a forward-proxy `http.Agent`. `http.Agent.protocol === 'http:'`, so on Node 22 `https.request` throws `TypeError: Protocol "https:" not supported. Expected "http:"` synchronously; on Node 18/20 it falls through and the TLS handshake fails. Manifests upstream as "Connection error". - **`options.auth`** — basic-auth meant for the proxy hop. Leaving it on `https.request` Basic-auths the *target* server with proxy credentials. - **Hop-by-hop headers (RFC 7230 §6.1)** — `Connection`, `Keep-Alive`, `Proxy-Authorization`, `TE`, `Trailer`, `Transfer-Encoding`, `Upgrade`, plus tokens named in `Connection` (transitively hop-by-hop), plus `Host` (proxy-pointing) and `Proxy-Connection` (de facto deprecated). `Connection: close` from the proxy hop defeats keep-alive on the rewrite; `Upgrade` to a non-WS target produces 400/426; `Proxy-*` leak proxy credentials upstream. Plus: TLS identity (`servername`, `checkServerIdentity`), socket-path proxy hint (`socketPath`), and source-binding / DNS hints (`localAddress`, `lookup`, `family`, `hints`) that all describe the connection to the proxy and don't apply to a different upstream. ## The fix `nemoclaw-blueprint/scripts/http-proxy-fix.js` (canonical) and the heredoc in `scripts/nemoclaw-start.sh` (byte-for-byte synced via the existing `http-proxy-fix-sync.test.ts`): - `sanitizeHeaders()` strips the full RFC 7230 §6.1 hop-by-hop set plus `Host`, `Proxy-Connection`, `Proxy-Authenticate`, and any token named in the `Connection` header (transitive per the same RFC). Case-insensitive. - `delete rewritten.agent`, `delete rewritten.auth`, `delete rewritten.servername`, `delete rewritten.checkServerIdentity`, `delete rewritten.socketPath`, `delete rewritten.localAddress`, `delete rewritten.lookup`, `delete rewritten.family`, `delete rewritten.hints` after the `Object.assign`. - Signal (AbortController), TLS material (`ca`/`cert`/`key`/`rejectUnauthorized`), `timeout`, body, and target-intent headers (`Authorization`, `Content-Type`, …) all survive. ## Tests - **`test/http-proxy-fix-rewrite.test.ts` (12 cases)** — spy-level rewrite tests pinning every strip; control test reproducing the broken-wrapper shape and asserting `TypeError`. - **`test/http-proxy-fix-e2e.test.ts` (1 case)** — end-to-end. openssl shell-out at module load (skipped locally if openssl missing; loud-fails under `CI=true`). Spins up an HTTPS mock on `127.0.0.1:0`, builds the FORWARD-mode `http.request` shape with forward-proxy `http.Agent` + proxy basic-auth + proxy-pointing Host, asserts the round trip completes. - **`test/http-proxy-fix-sync.test.ts` (6 cases, existing)** — byte-for-byte parity between the canonical wrapper and the `scripts/nemoclaw-start.sh` heredoc still enforced. ## What this PR no longer carries This branch previously included a Dockerfile change that added `'request': {'allowPrivateNetwork': True}` to the inference provider in the baked `openclaw.json`, and a regression test for it. Empirical investigation showed: - openclaw 2026.4.9's strict zod schema rejects `request.allowPrivateNetwork` as an unrecognized key (`src/config/zod-schema.core.ts:283-292` in v2026.4.9). - `openclaw doctor --fix` — which the Dockerfile runs at image build time — silently strips the key, leaving `"request": {}` before the image ships. - The plumbing that *reads* `request.allowPrivateNetwork` only landed in 2026.4.10 (upstream PR #63671, commit `0808dd111c`). So the previous "fix" was a no-op diagnosing the wrong layer. Dockerfile and `test/sandbox-provisioning.test.ts` are restored to `origin/main`. ## What this PR keeps from the prior iteration The label-rename + arithmetic-via-`--json` test improvements added earlier on this branch are kept — they catch regressions independent of the proxy bug: - `test/e2e/test-full-e2e.sh`: Phase 4b relabelled `[LIVE]` → `[ROUTING]`; new Phase 4c that runs `openclaw agent --json` over SSH inside the sandbox and asserts the model answered `42` to "What is 6 multiplied by 7?". Phase 4c is the only assertion in the suite that exercises the openclaw HTTP client end-to-end against `inference.local`. - `test/e2e/test-hermes-e2e.sh`: same `[LIVE]` → `[ROUTING]` relabel. - `test/e2e/test-sandbox-operations.sh` TC-SBX-02: replaces `Say exactly: HELLO_E2E` + merged-output grep with the arithmetic-via-`--json` pattern. - `test/e2e/e2e-cloud-experimental/features/skill/verify-sandbox-skill-via-agent.sh`: stops embedding `${VERIFY_TOKEN}` in the prompt; refuses overrides that smuggle it back; adds a negative assertion on `SsrFBlockedError` / transport / gateway-unavailable markers. ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Verification - [x] `npx vitest run test/http-proxy-fix-e2e.test.ts test/http-proxy-fix-rewrite.test.ts test/http-proxy-fix-sync.test.ts test/sandbox-provisioning.test.ts` — 28/28 pass. - [x] Adversarial review's blockers (B1–B3) and concerns (C1–C7) addressed in commit `69115de7`. - [x] `bash -n` and `shellcheck -S warning` clean on `scripts/nemoclaw-start.sh` and the four touched e2e scripts. - [x] Heredoc-sync test still passes; canonical wrapper and `scripts/nemoclaw-start.sh` heredoc updated together. - [x] No secrets, API keys, or credentials committed. ## Per-commit on this branch - `f162daa5` — initial strip of `agent`/`auth`/`Host`/`Proxy-*` (the fix), spy-level unit tests - `2f217f8a` — end-to-end test against a local HTTPS mock with self-signed cert - `69115de7` — wider RFC 7230 §6.1 hop-by-hop + TLS/transport hint strips, in-repo bisect control, vi.stubEnv consistency, http.request restore, 7-day cert, CI strict-skip ## How to validate after merge The signal job is `cloud-e2e` in `nightly-e2e.yaml` running `test/e2e/test-full-e2e.sh`. Watch for: ``` PASS: [LIVE] openclaw agent: model answered 6×7=42 through openclaw → inference.local ``` For the deepinfra-style failure specifically, the unit + e2e + control tests in this PR pin every strip the wrapper performs *and* the bug class it prevents. A real proxy + real upstream + real TLS round trip would need a CI job with a non-NVIDIA OpenAI-compatible API key — its own follow-up PR if you want belt-and-suspenders coverage. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Bug Fixes** * Improved HTTP proxy request handling to sanitize headers and remove incompatible routing-specific fields, ensuring proper downstream forwarding. * **Tests** * Added comprehensive test coverage for proxy header sanitization and rewrite behavior. * Enhanced E2E testing with improved routing-layer assertions and JSON payload extraction from agent responses. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…est (NVIDIA#2490) ## Summary Follow-up to NVIDIA#2344 (NemoClaw 0.0.24). The `http.request` → `https.request` rewrite in `nemoclaw-blueprint/scripts/http-proxy-fix.js` was shallow-copying the caller's options into the rewritten `https.request` call. That dragged forward-proxy-hop fields onto a request that now goes direct to the upstream. This is consistent with the Discord report from `mflova` after the 0.0.23 → 0.0.24 bump: > Finally, when I run the bot, I always get an error after sending any message to it: `error=LLM request failed: network connection error. rawError=Connection error` > > Some context: I am using 0.0.23 as the 0.0.24 had a bug with the network. mflova was on a custom OpenAI-compatible endpoint (deepinfra). NVIDIA Endpoints' DNS-rewritten path through OpenShell doesn't end up in the wrapper's FORWARD-mode branch, which is why upstream regressions there stayed invisible to nightly-e2e. mflova's reported error is the openclaw client's wrapped surface; on Node 22 the underlying mechanism is a synchronous `TypeError: Protocol "https:" not supported. Expected "http:"` thrown by `https.request` when the forward-proxy `http.Agent` reaches it. On Node 18/20 the same root cause manifests as a TLS handshake failure rather than a synchronous throw — same bug class, different surface. ## Bisect proof (committed in this PR) `test/http-proxy-fix-rewrite.test.ts` carries both halves of the proof in-repo: - The "control" `describe` block reproduces the *broken* pre-fix rewrite shape inline (`Object.assign` with no field strips, hop-by-hop headers preserved) and asserts `https.request` rejects it with `TypeError: Protocol "https:" not supported`. That test passes regardless of the wrapper, locking in the bug class. - The rewrite `describe` block asserts the wrapper produces options that *don't* match that shape — `agent`, `auth`, `Host`, `Proxy-*`, the rest of RFC 7230 §6.1 hop-by-hop, `servername`, `checkServerIdentity`, `socketPath`, `localAddress`, `lookup`, `family`, `hints` are all stripped on rewrite. Combined: bug class is real and detectable; wrapper's job is to never produce that shape. If a future maintainer reverts a strip, the rewrite test breaks; if Node changes the failure surface, the control test breaks. Two-sided coverage without storing a copy of the broken wrapper. ## What was wrong The rewrite was: ```js var rewritten = Object.assign({}, options, { hostname: target.hostname, host: target.hostname, port: target.port || 443, path: target.pathname + target.search, protocol: 'https:', }); return https.request(rewritten, callback); ``` Three classes of forward-proxy-hop fields rode along: - **`options.agent`** — a forward-proxy `http.Agent`. `http.Agent.protocol === 'http:'`, so on Node 22 `https.request` throws `TypeError: Protocol "https:" not supported. Expected "http:"` synchronously; on Node 18/20 it falls through and the TLS handshake fails. Manifests upstream as "Connection error". - **`options.auth`** — basic-auth meant for the proxy hop. Leaving it on `https.request` Basic-auths the *target* server with proxy credentials. - **Hop-by-hop headers (RFC 7230 §6.1)** — `Connection`, `Keep-Alive`, `Proxy-Authorization`, `TE`, `Trailer`, `Transfer-Encoding`, `Upgrade`, plus tokens named in `Connection` (transitively hop-by-hop), plus `Host` (proxy-pointing) and `Proxy-Connection` (de facto deprecated). `Connection: close` from the proxy hop defeats keep-alive on the rewrite; `Upgrade` to a non-WS target produces 400/426; `Proxy-*` leak proxy credentials upstream. Plus: TLS identity (`servername`, `checkServerIdentity`), socket-path proxy hint (`socketPath`), and source-binding / DNS hints (`localAddress`, `lookup`, `family`, `hints`) that all describe the connection to the proxy and don't apply to a different upstream. ## The fix `nemoclaw-blueprint/scripts/http-proxy-fix.js` (canonical) and the heredoc in `scripts/nemoclaw-start.sh` (byte-for-byte synced via the existing `http-proxy-fix-sync.test.ts`): - `sanitizeHeaders()` strips the full RFC 7230 §6.1 hop-by-hop set plus `Host`, `Proxy-Connection`, `Proxy-Authenticate`, and any token named in the `Connection` header (transitive per the same RFC). Case-insensitive. - `delete rewritten.agent`, `delete rewritten.auth`, `delete rewritten.servername`, `delete rewritten.checkServerIdentity`, `delete rewritten.socketPath`, `delete rewritten.localAddress`, `delete rewritten.lookup`, `delete rewritten.family`, `delete rewritten.hints` after the `Object.assign`. - Signal (AbortController), TLS material (`ca`/`cert`/`key`/`rejectUnauthorized`), `timeout`, body, and target-intent headers (`Authorization`, `Content-Type`, …) all survive. ## Tests - **`test/http-proxy-fix-rewrite.test.ts` (12 cases)** — spy-level rewrite tests pinning every strip; control test reproducing the broken-wrapper shape and asserting `TypeError`. - **`test/http-proxy-fix-e2e.test.ts` (1 case)** — end-to-end. openssl shell-out at module load (skipped locally if openssl missing; loud-fails under `CI=true`). Spins up an HTTPS mock on `127.0.0.1:0`, builds the FORWARD-mode `http.request` shape with forward-proxy `http.Agent` + proxy basic-auth + proxy-pointing Host, asserts the round trip completes. - **`test/http-proxy-fix-sync.test.ts` (6 cases, existing)** — byte-for-byte parity between the canonical wrapper and the `scripts/nemoclaw-start.sh` heredoc still enforced. ## What this PR no longer carries This branch previously included a Dockerfile change that added `'request': {'allowPrivateNetwork': True}` to the inference provider in the baked `openclaw.json`, and a regression test for it. Empirical investigation showed: - openclaw 2026.4.9's strict zod schema rejects `request.allowPrivateNetwork` as an unrecognized key (`src/config/zod-schema.core.ts:283-292` in v2026.4.9). - `openclaw doctor --fix` — which the Dockerfile runs at image build time — silently strips the key, leaving `"request": {}` before the image ships. - The plumbing that *reads* `request.allowPrivateNetwork` only landed in 2026.4.10 (upstream PR #63671, commit `0808dd111c`). So the previous "fix" was a no-op diagnosing the wrong layer. Dockerfile and `test/sandbox-provisioning.test.ts` are restored to `origin/main`. ## What this PR keeps from the prior iteration The label-rename + arithmetic-via-`--json` test improvements added earlier on this branch are kept — they catch regressions independent of the proxy bug: - `test/e2e/test-full-e2e.sh`: Phase 4b relabelled `[LIVE]` → `[ROUTING]`; new Phase 4c that runs `openclaw agent --json` over SSH inside the sandbox and asserts the model answered `42` to "What is 6 multiplied by 7?". Phase 4c is the only assertion in the suite that exercises the openclaw HTTP client end-to-end against `inference.local`. - `test/e2e/test-hermes-e2e.sh`: same `[LIVE]` → `[ROUTING]` relabel. - `test/e2e/test-sandbox-operations.sh` TC-SBX-02: replaces `Say exactly: HELLO_E2E` + merged-output grep with the arithmetic-via-`--json` pattern. - `test/e2e/e2e-cloud-experimental/features/skill/verify-sandbox-skill-via-agent.sh`: stops embedding `${VERIFY_TOKEN}` in the prompt; refuses overrides that smuggle it back; adds a negative assertion on `SsrFBlockedError` / transport / gateway-unavailable markers. ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Verification - [x] `npx vitest run test/http-proxy-fix-e2e.test.ts test/http-proxy-fix-rewrite.test.ts test/http-proxy-fix-sync.test.ts test/sandbox-provisioning.test.ts` — 28/28 pass. - [x] Adversarial review's blockers (B1–B3) and concerns (C1–C7) addressed in commit `69115de7`. - [x] `bash -n` and `shellcheck -S warning` clean on `scripts/nemoclaw-start.sh` and the four touched e2e scripts. - [x] Heredoc-sync test still passes; canonical wrapper and `scripts/nemoclaw-start.sh` heredoc updated together. - [x] No secrets, API keys, or credentials committed. ## Per-commit on this branch - `f162daa5` — initial strip of `agent`/`auth`/`Host`/`Proxy-*` (the fix), spy-level unit tests - `2f217f8a` — end-to-end test against a local HTTPS mock with self-signed cert - `69115de7` — wider RFC 7230 §6.1 hop-by-hop + TLS/transport hint strips, in-repo bisect control, vi.stubEnv consistency, http.request restore, 7-day cert, CI strict-skip ## How to validate after merge The signal job is `cloud-e2e` in `nightly-e2e.yaml` running `test/e2e/test-full-e2e.sh`. Watch for: ``` PASS: [LIVE] openclaw agent: model answered 6×7=42 through openclaw → inference.local ``` For the deepinfra-style failure specifically, the unit + e2e + control tests in this PR pin every strip the wrapper performs *and* the bug class it prevents. A real proxy + real upstream + real TLS round trip would need a CI job with a non-NVIDIA OpenAI-compatible API key — its own follow-up PR if you want belt-and-suspenders coverage. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Bug Fixes** * Improved HTTP proxy request handling to sanitize headers and remove incompatible routing-specific fields, ensuring proper downstream forwarding. * **Tests** * Added comprehensive test coverage for proxy header sanitization and rewrite behavior. * Enhanced E2E testing with improved routing-layer assertions and JSON payload extraction from agent responses. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Follow-up to NVIDIA#2344 (the http.request wrapper for NVIDIA#2109). PR NVIDIA#2344 covers axios / follow-redirects / proxy-from-env, all of which flow through Node's http.request and EnvHttpProxyAgent. This change covers an additional code path the Microsoft Teams adapter uses: native fetch() with a custom undici dispatcher. When OpenClaw's Teams adapter handles a channel @mention it calls the Microsoft Graph API via: fetch('https://graph.microsoft.com/v1.0/teams/.../messages/...', { dispatcher: customUndiciAgent }) The custom dispatcher routes the request through its own connection pool, bypassing the default EnvHttpProxyAgent — which is the only thing routing HTTPS through the OpenShell L7 proxy. Direct egress to graph.microsoft.com is blocked by the sandbox network namespace and surfaces as ECONNREFUSED. Channel @mentions silently fail; DMs work because the webhook payload carries the message body and no Graph call is needed. Wrap globalThis.fetch in the same preload that wraps http.request. When fetch() is called with a custom dispatcher and an HTTPS URL, strip the dispatcher and let EnvHttpProxyAgent handle the request. Non-HTTPS URLs and dispatcher-free calls pass through unchanged so intra-sandbox HTTP traffic and any non-proxy use of the dispatcher option keep working. Refinements over the originally proposed fix: - URL extraction handles all three fetch() input forms — string, URL object (.href), Request object (.url) — in that order. A naive url?.url check would miss URL instances (which have .href, not .url) and silently leave the dispatcher attached, defeating the fix on callers that pass URL instances. - One-shot console.warn so the strip is auditable in logs without spamming on every call. The flag is closure-scoped so a process restart re-arms it. - Defensive typeof origFetch === function guard so a Node runtime that ever ships without globalThis.fetch silently no-ops instead of throwing at preload time. After NVIDIA#3109 the preload is delivered as a standalone module copied from /usr/local/lib/nemoclaw/preloads/ at boot, so this PR only touches the canonical http-proxy-fix.js — no heredoc to keep in sync. http-proxy-fix-sync.test.ts already runs the entrypoint block end-to- end and reads the generated file; extended with explicit assertions for http.request and globalThis.fetch wrapper presence so a future deletion of either wrapper trips the test. Verified end-to-end on a real proxy-enabled sandbox: bot replies to Teams channel @mentions; DM regression check passes; non-Graph fetch() and fetch() without a custom dispatcher unaffected.
Summary
Signed replay of #2323 by @lcsmontiel — same changes, commits signed to pass the org signature check.
axios-proxy-fix.jswith anhttp.request()wrapper that catches the FORWARD-vs-CONNECT mismatch at the lowest common denominatortry/catcharoundnew URL(options.path)to prevent process crashes on malformed URLsOriginal PR
All design rationale, failure analysis, and E2E validation are documented in #2323. Credit to @lcsmontiel for the fix.
Test plan
npx vitest run --project cli test/http-proxy-fix-sync.test.ts— 6/6 passnpx vitest run --project cli test/service-env.test.ts— 41/41 passCloses #2109.
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests
Chores