Skip to content

fix(sandbox): rewrite #2109 proxy fix as http.request wrapper (signed)#2344

Merged
ericksoa merged 4 commits into
mainfrom
fix/2109-http-proxy-wrapper-signed
Apr 23, 2026
Merged

fix(sandbox): rewrite #2109 proxy fix as http.request wrapper (signed)#2344
ericksoa merged 4 commits into
mainfrom
fix/2109-http-proxy-wrapper-signed

Conversation

@ericksoa

@ericksoa ericksoa commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Signed replay of #2323 by @lcsmontiel — same changes, commits signed to pass the org signature check.

  • Replaces axios-proxy-fix.js with an http.request() wrapper that catches the FORWARD-vs-CONNECT mismatch at the lowest common denominator
  • Adds try/catch around new URL(options.path) to prevent process crashes on malformed URLs
  • Preserves caller-supplied options in the rewritten request
  • Verified end-to-end on a real proxy-enabled sandbox (see fix(sandbox): rewrite #2109 proxy fix as http.request wrapper #2323 for full validation table)

Original PR

All design rationale, failure analysis, and E2E validation are documented in #2323. Credit to @lcsmontiel for the fix.

Test plan

  • npx vitest run --project cli test/http-proxy-fix-sync.test.ts — 6/6 pass
  • npx vitest run --project cli test/service-env.test.ts — 41/41 pass
  • All pre-commit and pre-push hooks pass
  • CI

Closes #2109.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added HTTP request interception for improved proxy environment variable handling, with enhanced support for Node.js 22.
  • Tests

    • Added new test suite validating proxy fix consistency and configuration.
    • Updated proxy environment variable tests to reflect new implementation.
  • Chores

    • Removed legacy proxy handling implementation.
    • Updated startup script to use enhanced proxy mechanism.

lcsmontiel and others added 4 commits April 23, 2026 04:25
PR #2110's axios-only Module._load preload never fired at runtime:

  1. nemoclaw-blueprint/scripts/ is excluded from the optimized sandbox
     build context (src/lib/sandbox-build-context.ts), so
     axios-proxy-fix.js was not baked into the sandbox image.
  2. Adding scripts/ to the build context cache-busts the
     `COPY nemoclaw-blueprint/` Dockerfile layer and hangs npm ci in
     the k3s Docker-in-Docker build, so the delivery gap cannot be
     closed by expanding the context.
  3. Even if the file had reached the image, intercepting
     require('axios') via Module._load cannot patch follow-redirects +
     proxy-from-env bundled as ESM in OpenClaw's dist/http-Bh-HtMAg.js
     — there are no require() calls to intercept. The Bot Connector
     reply path uses the bundled code.

Replace with an http.request() wrapper — the lowest common denominator
every HTTP library bottoms out at. Detect FORWARD-mode requests
(hostname = proxy IP, path = full https:// URL) and rewrite them to
https.request() against the real target, letting NODE_USE_ENV_PROXY
handle the CONNECT tunnel correctly. Works for any HTTP client,
including bundled ESM that makes no require() calls.

Delivery:
  - nemoclaw-blueprint/scripts/http-proxy-fix.js — canonical source for
    review and tests.
  - scripts/nemoclaw-start.sh embeds the same JS inline via a heredoc,
    writes it to /tmp/nemoclaw-http-proxy-fix.js through
    emit_sandbox_sourced_file (root:root 444, symlink-safe), and loads
    it via NODE_OPTIONS=--require. No changes to sandbox-build-context.
  - test/http-proxy-fix-sync.test.ts enforces byte-for-byte equality
    between the heredoc and the canonical file, so future edits cannot
    silently diverge.
  - validate_tmp_permissions is invoked with the new path on both the
    root and non-root boot paths (the fix JS is a trust-boundary file
    — tampering would inject arbitrary code into every Node process
    via NODE_OPTIONS).

Because the content ships inside nemoclaw-start.sh rather than as a
separately-deployed file, the fix fires on the very first sandbox boot
with no post-onboard deploy + restart dance required.

Verified end-to-end on 2026-04-23: EC2 t3.large (ca-central-1),
NemoClaw v0.0.22 + OpenShell v0.0.29, Node 22.22.1. Direct
axios.get('https://clawhub.ai') returns 200 inside the sandbox; full
Teams -> ALB -> OpenClaw -> LiteLLM/Bedrock -> Bot Connector ->
Teams round-trip succeeds. No `FORWARD rejected` entries in OpenShell
network logs. Comparison table and reproduction steps posted in the
PR description.

Scope:
  - Fixes the #2109 regression class (axios / follow-redirects /
    proxy-from-env FORWARD-mode rewrites on NODE_USE_ENV_PROXY=1).
  - Does NOT fix #1570 (Discord WebSocket via the ws library). That
    bug sits at a different layer — EnvHttpProxyAgent's FORWARD-vs-
    CONNECT decision for Upgrade: websocket requests — and needs the
    agent-swap treatment that #2296 applies. The http.request wrapper
    in this PR cannot safely handle that case (it would re-enter the
    same faulty agent logic).
  - Does NOT modify sandbox-build-context.ts.

Removes the superseded nemoclaw-blueprint/scripts/axios-proxy-fix.js
and updates the existing regression tests in service-env.test.ts to
the new variable name (_PROXY_FIX_SCRIPT).

Closes #2109.
Wrap the `new URL(options.path)` call in a try/catch so that a
malformed path value (which passes the `startsWith('https://')` check
but still fails URL parsing) falls back to the original http.request
instead of crashing the Node process.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address CodeRabbit review feedback on PR.

The FORWARD-mode branch previously constructed a fresh options object
with only {method, hostname, host, port, path, protocol, headers,
timeout}, silently dropping caller-supplied fields that can matter for
correctness:

  - signal — AbortController, used by modern axios/fetch for
    cancellation. Dropping it meant user-initiated aborts would not
    propagate to the rewritten https.request, leaving the request
    running after the caller thought it was cancelled.
  - TLS: ca, cert, key, passphrase, rejectUnauthorized — custom trust
    anchors or mTLS settings. Uncommon in the FORWARD path but not
    impossible.
  - auth — Basic-auth credentials for the target origin.
  - lookup, family, localAddress, maxHeaderSize, insecureHTTPParser —
    per-request network/parser tuning.

Switch to Object.assign({}, options, { ...proxy-routing-fields })
which clones the caller's options and overwrites only the fields we
explicitly need to change (method default, hostname/host/port/path/
protocol). Everything else is carried over verbatim.

Mirror the edit in the inline heredoc in scripts/nemoclaw-start.sh so
the canonical file and the embedded copy remain byte-identical; the
http-proxy-fix-sync test enforces this.
Address CodeRabbit review feedback (minor, PR #2323).

Both `catch (e)` bindings in http-proxy-fix.js are unused — one in the
proxy URL parse (falls through silently) and one in the FORWARD-path
new URL guard (returns the original request). Rename to `catch (_e)`
to satisfy the project's "unused variables must be prefixed with _"
convention.

Mirror the edit in the inline heredoc in scripts/nemoclaw-start.sh so
the canonical file and the embedded copy remain byte-identical; the
http-proxy-fix-sync test enforces this.
@coderabbitai

coderabbitai Bot commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: d1d8f041-3379-47ca-9971-9283bb40a223

📥 Commits

Reviewing files that changed from the base of the PR and between d9aced4 and 97d901f.

📒 Files selected for processing (5)
  • nemoclaw-blueprint/scripts/axios-proxy-fix.js
  • nemoclaw-blueprint/scripts/http-proxy-fix.js
  • scripts/nemoclaw-start.sh
  • test/http-proxy-fix-sync.test.ts
  • test/service-env.test.ts
💤 Files with no reviewable changes (1)
  • nemoclaw-blueprint/scripts/axios-proxy-fix.js

📝 Walkthrough

Walkthrough

Replaces an axios-specific proxy preload script with a more general HTTP proxy fix that intercepts http.request() calls in FORWARD-mode proxy scenarios (http request to proxy host with https:// path) and rewrites them as https.request() calls against the real target. Updates the startup script to inline and require the new preload, adds synchronization tests, and removes axios-specific references.

Changes

Cohort / File(s) Summary
Proxy Fix Scripts
nemoclaw-blueprint/scripts/axios-proxy-fix.js, nemoclaw-blueprint/scripts/http-proxy-fix.js
Removed axios-specific preload that set axios.defaults.proxy = false. Replaced with new http.request() interceptor that parses HTTP(S)_PROXY environment variables and rewrites FORWARD-mode proxy requests (http to proxy host with https:// path) as https.request() calls against the real target, avoiding double proxy handling.
Startup Configuration
scripts/nemoclaw-start.sh
Replaces axios-fix preload injection with http-proxy-fix embedded inline via heredoc at /tmp/nemoclaw-http-proxy-fix.js. Updates NODE_OPTIONS to --require the new script in both main runtime and openshell sandbox sessions. Adds permission validation for the generated proxy-fix script in both non-root and root execution paths.
Test Suite
test/http-proxy-fix-sync.test.ts, test/service-env.test.ts
Introduces new sync test to ensure http-proxy-fix.js in nemoclaw-start.sh remains byte-for-byte synchronized with canonical implementation, validates NODE_OPTIONS configuration, and verifies /tmp permission checks. Updates existing service-env tests to target http-proxy-fix behavior and removes axios-proxy-fix references.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

bug, OpenShell, fix

Poem

🐰 A proxy fix hops in place,
No more double-proxy disgrace!
HTTP bends to HTTPS's call,
With rewritten requests, we've fixed it all,
Node's dance with axios—finally grace! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: replacing an axios-specific proxy patch with an http.request() wrapper to fix proxy conflicts in sandboxes.
Linked Issues check ✅ Passed The PR fully addresses #2109 requirements: fixes axios ERR_BAD_RESPONSE via http.request wrapper, enables proper CONNECT tunnels, preserves working clients, removes axios-specific preload, and includes comprehensive tests.
Out of Scope Changes check ✅ Passed All changes are directly related to the proxy fix objectives: deleting axios-proxy-fix.js, adding http-proxy-fix.js as canonical file, embedding it in nemoclaw-start.sh, and adding corresponding tests.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/2109-http-proxy-wrapper-signed

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Comment @coderabbitai help to get the list of available commands and usage tips.

@ericksoa ericksoa merged commit b1afc50 into main Apr 23, 2026
22 checks passed
ericksoa added a commit that referenced this pull request Apr 27, 2026
Follow-up to #2344 (NemoClaw 0.0.24). The http.request → https.request
rewrite that resolves the NODE_USE_ENV_PROXY=1 / library-side proxy
double-processing was shallow-copying the caller's options into the
rewritten https.request. That dragged three classes of forward-proxy-
hop fields onto a request that now goes direct to the upstream:

  * options.agent — a forward-proxy http.Agent. http.Agent cannot speak
    TLS, so https.request either ignores it and falls back to a default
    agent or fails the TLS handshake outright depending on caller. This
    is the most likely root cause of the deepinfra-style report on
    Discord ("LLM request failed: network connection error" against a
    custom OpenAI-compatible upstream while NVIDIA-routed providers
    keep working). NVIDIA Endpoints' DNS-rewritten path through OpenShell
    doesn't end up in this branch, so the bug stayed invisible.

  * options.auth — basic-auth meant for the proxy hop. Leaving it on
    https.request would Basic-auth the *target* server with proxy
    credentials.

  * Host / Proxy-Authorization / Proxy-Connection / Proxy-Authenticate
    headers — Host points at the proxy host and the Proxy-* headers are
    hop-by-hop. Stripping them lets Node regenerate Host from the
    target's hostname/port and prevents proxy creds from leaking
    upstream.

Adds test/http-proxy-fix-rewrite.test.ts with seven cases pinning each
strip and confirming signal/timeout/TLS fields and target headers
(Authorization, Content-Type, …) survive. The existing
http-proxy-fix-sync.test.ts byte-for-byte enforcer still passes — the
canonical wrapper at nemoclaw-blueprint/scripts/http-proxy-fix.js and
the heredoc in scripts/nemoclaw-start.sh were updated together.

Reverts the openclaw `request.allowPrivateNetwork` Dockerfile change
that this PR previously carried — empirical investigation showed
openclaw 2026.4.9 rejects that key in strict zod and `openclaw doctor
--fix` (which the Dockerfile runs at build time) silently strips it
back to `"request": {}` before the image ships. The upstream plumbing
that reads `request.allowPrivateNetwork` only landed in 2026.4.10
(commit 0808dd111c, openclaw PR #63671). The change here was a no-op,
diagnosing the wrong layer; this commit replaces it with the actual
fix at the wrapper layer.

The label-rename + arithmetic-via-`--json` test improvements added
earlier on this branch (Phase 4c, TC-SBX-02 rewrite, skill-verify
hardening, sandbox_exec stderr-merge fix) are kept — they catch
regressions independent of the proxy bug.

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
ericksoa added a commit that referenced this pull request Apr 27, 2026
…est (#2490)

## Summary

Follow-up to #2344 (NemoClaw 0.0.24). The `http.request` →
`https.request` rewrite in
`nemoclaw-blueprint/scripts/http-proxy-fix.js` was shallow-copying the
caller's options into the rewritten `https.request` call. That dragged
forward-proxy-hop fields onto a request that now goes direct to the
upstream. This is consistent with the Discord report from `mflova` after
the 0.0.23 → 0.0.24 bump:

> Finally, when I run the bot, I always get an error after sending any
message to it: `error=LLM request failed: network connection error.
rawError=Connection error`
>
> Some context: I am using 0.0.23 as the 0.0.24 had a bug with the
network.

mflova was on a custom OpenAI-compatible endpoint (deepinfra). NVIDIA
Endpoints' DNS-rewritten path through OpenShell doesn't end up in the
wrapper's FORWARD-mode branch, which is why upstream regressions there
stayed invisible to nightly-e2e. mflova's reported error is the openclaw
client's wrapped surface; on Node 22 the underlying mechanism is a
synchronous `TypeError: Protocol "https:" not supported. Expected
"http:"` thrown by `https.request` when the forward-proxy `http.Agent`
reaches it. On Node 18/20 the same root cause manifests as a TLS
handshake failure rather than a synchronous throw — same bug class,
different surface.

## Bisect proof (committed in this PR)

`test/http-proxy-fix-rewrite.test.ts` carries both halves of the proof
in-repo:

- The "control" `describe` block reproduces the *broken* pre-fix rewrite
shape inline (`Object.assign` with no field strips, hop-by-hop headers
preserved) and asserts `https.request` rejects it with `TypeError:
Protocol "https:" not supported`. That test passes regardless of the
wrapper, locking in the bug class.
- The rewrite `describe` block asserts the wrapper produces options that
*don't* match that shape — `agent`, `auth`, `Host`, `Proxy-*`, the rest
of RFC 7230 §6.1 hop-by-hop, `servername`, `checkServerIdentity`,
`socketPath`, `localAddress`, `lookup`, `family`, `hints` are all
stripped on rewrite.

Combined: bug class is real and detectable; wrapper's job is to never
produce that shape. If a future maintainer reverts a strip, the rewrite
test breaks; if Node changes the failure surface, the control test
breaks. Two-sided coverage without storing a copy of the broken wrapper.

## What was wrong

The rewrite was:

```js
var rewritten = Object.assign({}, options, {
  hostname: target.hostname, host: target.hostname,
  port: target.port || 443, path: target.pathname + target.search,
  protocol: 'https:',
});
return https.request(rewritten, callback);
```

Three classes of forward-proxy-hop fields rode along:

- **`options.agent`** — a forward-proxy `http.Agent`.
`http.Agent.protocol === 'http:'`, so on Node 22 `https.request` throws
`TypeError: Protocol "https:" not supported. Expected "http:"`
synchronously; on Node 18/20 it falls through and the TLS handshake
fails. Manifests upstream as "Connection error".
- **`options.auth`** — basic-auth meant for the proxy hop. Leaving it on
`https.request` Basic-auths the *target* server with proxy credentials.
- **Hop-by-hop headers (RFC 7230 §6.1)** — `Connection`, `Keep-Alive`,
`Proxy-Authorization`, `TE`, `Trailer`, `Transfer-Encoding`, `Upgrade`,
plus tokens named in `Connection` (transitively hop-by-hop), plus `Host`
(proxy-pointing) and `Proxy-Connection` (de facto deprecated).
`Connection: close` from the proxy hop defeats keep-alive on the
rewrite; `Upgrade` to a non-WS target produces 400/426; `Proxy-*` leak
proxy credentials upstream.

Plus: TLS identity (`servername`, `checkServerIdentity`), socket-path
proxy hint (`socketPath`), and source-binding / DNS hints
(`localAddress`, `lookup`, `family`, `hints`) that all describe the
connection to the proxy and don't apply to a different upstream.

## The fix

`nemoclaw-blueprint/scripts/http-proxy-fix.js` (canonical) and the
heredoc in `scripts/nemoclaw-start.sh` (byte-for-byte synced via the
existing `http-proxy-fix-sync.test.ts`):

- `sanitizeHeaders()` strips the full RFC 7230 §6.1 hop-by-hop set plus
`Host`, `Proxy-Connection`, `Proxy-Authenticate`, and any token named in
the `Connection` header (transitive per the same RFC). Case-insensitive.
- `delete rewritten.agent`, `delete rewritten.auth`, `delete
rewritten.servername`, `delete rewritten.checkServerIdentity`, `delete
rewritten.socketPath`, `delete rewritten.localAddress`, `delete
rewritten.lookup`, `delete rewritten.family`, `delete rewritten.hints`
after the `Object.assign`.
- Signal (AbortController), TLS material
(`ca`/`cert`/`key`/`rejectUnauthorized`), `timeout`, body, and
target-intent headers (`Authorization`, `Content-Type`, …) all survive.

## Tests

- **`test/http-proxy-fix-rewrite.test.ts` (12 cases)** — spy-level
rewrite tests pinning every strip; control test reproducing the
broken-wrapper shape and asserting `TypeError`.
- **`test/http-proxy-fix-e2e.test.ts` (1 case)** — end-to-end. openssl
shell-out at module load (skipped locally if openssl missing; loud-fails
under `CI=true`). Spins up an HTTPS mock on `127.0.0.1:0`, builds the
FORWARD-mode `http.request` shape with forward-proxy `http.Agent` +
proxy basic-auth + proxy-pointing Host, asserts the round trip
completes.
- **`test/http-proxy-fix-sync.test.ts` (6 cases, existing)** —
byte-for-byte parity between the canonical wrapper and the
`scripts/nemoclaw-start.sh` heredoc still enforced.

## What this PR no longer carries

This branch previously included a Dockerfile change that added
`'request': {'allowPrivateNetwork': True}` to the inference provider in
the baked `openclaw.json`, and a regression test for it. Empirical
investigation showed:

- openclaw 2026.4.9's strict zod schema rejects
`request.allowPrivateNetwork` as an unrecognized key
(`src/config/zod-schema.core.ts:283-292` in v2026.4.9).
- `openclaw doctor --fix` — which the Dockerfile runs at image build
time — silently strips the key, leaving `"request": {}` before the image
ships.
- The plumbing that *reads* `request.allowPrivateNetwork` only landed in
2026.4.10 (upstream PR #63671, commit `0808dd111c`).

So the previous "fix" was a no-op diagnosing the wrong layer. Dockerfile
and `test/sandbox-provisioning.test.ts` are restored to `origin/main`.

## What this PR keeps from the prior iteration

The label-rename + arithmetic-via-`--json` test improvements added
earlier on this branch are kept — they catch regressions independent of
the proxy bug:

- `test/e2e/test-full-e2e.sh`: Phase 4b relabelled `[LIVE]` →
`[ROUTING]`; new Phase 4c that runs `openclaw agent --json` over SSH
inside the sandbox and asserts the model answered `42` to "What is 6
multiplied by 7?". Phase 4c is the only assertion in the suite that
exercises the openclaw HTTP client end-to-end against `inference.local`.
- `test/e2e/test-hermes-e2e.sh`: same `[LIVE]` → `[ROUTING]` relabel.
- `test/e2e/test-sandbox-operations.sh` TC-SBX-02: replaces `Say
exactly: HELLO_E2E` + merged-output grep with the
arithmetic-via-`--json` pattern.
-
`test/e2e/e2e-cloud-experimental/features/skill/verify-sandbox-skill-via-agent.sh`:
stops embedding `${VERIFY_TOKEN}` in the prompt; refuses overrides that
smuggle it back; adds a negative assertion on `SsrFBlockedError` /
transport / gateway-unavailable markers.

## Type of Change

- [x] Code change (feature, bug fix, or refactor)

## Verification

- [x] `npx vitest run test/http-proxy-fix-e2e.test.ts
test/http-proxy-fix-rewrite.test.ts test/http-proxy-fix-sync.test.ts
test/sandbox-provisioning.test.ts` — 28/28 pass.
- [x] Adversarial review's blockers (B1–B3) and concerns (C1–C7)
addressed in commit `69115de7`.
- [x] `bash -n` and `shellcheck -S warning` clean on
`scripts/nemoclaw-start.sh` and the four touched e2e scripts.
- [x] Heredoc-sync test still passes; canonical wrapper and
`scripts/nemoclaw-start.sh` heredoc updated together.
- [x] No secrets, API keys, or credentials committed.

## Per-commit on this branch

- `f162daa5` — initial strip of `agent`/`auth`/`Host`/`Proxy-*` (the
fix), spy-level unit tests
- `2f217f8a` — end-to-end test against a local HTTPS mock with
self-signed cert
- `69115de7` — wider RFC 7230 §6.1 hop-by-hop + TLS/transport hint
strips, in-repo bisect control, vi.stubEnv consistency, http.request
restore, 7-day cert, CI strict-skip

## How to validate after merge

The signal job is `cloud-e2e` in `nightly-e2e.yaml` running
`test/e2e/test-full-e2e.sh`. Watch for:

```
PASS: [LIVE] openclaw agent: model answered 6×7=42 through openclaw → inference.local
```

For the deepinfra-style failure specifically, the unit + e2e + control
tests in this PR pin every strip the wrapper performs *and* the bug
class it prevents. A real proxy + real upstream + real TLS round trip
would need a CI job with a non-NVIDIA OpenAI-compatible API key — its
own follow-up PR if you want belt-and-suspenders coverage.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **Bug Fixes**
* Improved HTTP proxy request handling to sanitize headers and remove
incompatible routing-specific fields, ensuring proper downstream
forwarding.

* **Tests**
* Added comprehensive test coverage for proxy header sanitization and
rewrite behavior.
* Enhanced E2E testing with improved routing-layer assertions and JSON
payload extraction from agent responses.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
…est (NVIDIA#2490)

## Summary

Follow-up to NVIDIA#2344 (NemoClaw 0.0.24). The `http.request` →
`https.request` rewrite in
`nemoclaw-blueprint/scripts/http-proxy-fix.js` was shallow-copying the
caller's options into the rewritten `https.request` call. That dragged
forward-proxy-hop fields onto a request that now goes direct to the
upstream. This is consistent with the Discord report from `mflova` after
the 0.0.23 → 0.0.24 bump:

> Finally, when I run the bot, I always get an error after sending any
message to it: `error=LLM request failed: network connection error.
rawError=Connection error`
>
> Some context: I am using 0.0.23 as the 0.0.24 had a bug with the
network.

mflova was on a custom OpenAI-compatible endpoint (deepinfra). NVIDIA
Endpoints' DNS-rewritten path through OpenShell doesn't end up in the
wrapper's FORWARD-mode branch, which is why upstream regressions there
stayed invisible to nightly-e2e. mflova's reported error is the openclaw
client's wrapped surface; on Node 22 the underlying mechanism is a
synchronous `TypeError: Protocol "https:" not supported. Expected
"http:"` thrown by `https.request` when the forward-proxy `http.Agent`
reaches it. On Node 18/20 the same root cause manifests as a TLS
handshake failure rather than a synchronous throw — same bug class,
different surface.

## Bisect proof (committed in this PR)

`test/http-proxy-fix-rewrite.test.ts` carries both halves of the proof
in-repo:

- The "control" `describe` block reproduces the *broken* pre-fix rewrite
shape inline (`Object.assign` with no field strips, hop-by-hop headers
preserved) and asserts `https.request` rejects it with `TypeError:
Protocol "https:" not supported`. That test passes regardless of the
wrapper, locking in the bug class.
- The rewrite `describe` block asserts the wrapper produces options that
*don't* match that shape — `agent`, `auth`, `Host`, `Proxy-*`, the rest
of RFC 7230 §6.1 hop-by-hop, `servername`, `checkServerIdentity`,
`socketPath`, `localAddress`, `lookup`, `family`, `hints` are all
stripped on rewrite.

Combined: bug class is real and detectable; wrapper's job is to never
produce that shape. If a future maintainer reverts a strip, the rewrite
test breaks; if Node changes the failure surface, the control test
breaks. Two-sided coverage without storing a copy of the broken wrapper.

## What was wrong

The rewrite was:

```js
var rewritten = Object.assign({}, options, {
  hostname: target.hostname, host: target.hostname,
  port: target.port || 443, path: target.pathname + target.search,
  protocol: 'https:',
});
return https.request(rewritten, callback);
```

Three classes of forward-proxy-hop fields rode along:

- **`options.agent`** — a forward-proxy `http.Agent`.
`http.Agent.protocol === 'http:'`, so on Node 22 `https.request` throws
`TypeError: Protocol "https:" not supported. Expected "http:"`
synchronously; on Node 18/20 it falls through and the TLS handshake
fails. Manifests upstream as "Connection error".
- **`options.auth`** — basic-auth meant for the proxy hop. Leaving it on
`https.request` Basic-auths the *target* server with proxy credentials.
- **Hop-by-hop headers (RFC 7230 §6.1)** — `Connection`, `Keep-Alive`,
`Proxy-Authorization`, `TE`, `Trailer`, `Transfer-Encoding`, `Upgrade`,
plus tokens named in `Connection` (transitively hop-by-hop), plus `Host`
(proxy-pointing) and `Proxy-Connection` (de facto deprecated).
`Connection: close` from the proxy hop defeats keep-alive on the
rewrite; `Upgrade` to a non-WS target produces 400/426; `Proxy-*` leak
proxy credentials upstream.

Plus: TLS identity (`servername`, `checkServerIdentity`), socket-path
proxy hint (`socketPath`), and source-binding / DNS hints
(`localAddress`, `lookup`, `family`, `hints`) that all describe the
connection to the proxy and don't apply to a different upstream.

## The fix

`nemoclaw-blueprint/scripts/http-proxy-fix.js` (canonical) and the
heredoc in `scripts/nemoclaw-start.sh` (byte-for-byte synced via the
existing `http-proxy-fix-sync.test.ts`):

- `sanitizeHeaders()` strips the full RFC 7230 §6.1 hop-by-hop set plus
`Host`, `Proxy-Connection`, `Proxy-Authenticate`, and any token named in
the `Connection` header (transitive per the same RFC). Case-insensitive.
- `delete rewritten.agent`, `delete rewritten.auth`, `delete
rewritten.servername`, `delete rewritten.checkServerIdentity`, `delete
rewritten.socketPath`, `delete rewritten.localAddress`, `delete
rewritten.lookup`, `delete rewritten.family`, `delete rewritten.hints`
after the `Object.assign`.
- Signal (AbortController), TLS material
(`ca`/`cert`/`key`/`rejectUnauthorized`), `timeout`, body, and
target-intent headers (`Authorization`, `Content-Type`, …) all survive.

## Tests

- **`test/http-proxy-fix-rewrite.test.ts` (12 cases)** — spy-level
rewrite tests pinning every strip; control test reproducing the
broken-wrapper shape and asserting `TypeError`.
- **`test/http-proxy-fix-e2e.test.ts` (1 case)** — end-to-end. openssl
shell-out at module load (skipped locally if openssl missing; loud-fails
under `CI=true`). Spins up an HTTPS mock on `127.0.0.1:0`, builds the
FORWARD-mode `http.request` shape with forward-proxy `http.Agent` +
proxy basic-auth + proxy-pointing Host, asserts the round trip
completes.
- **`test/http-proxy-fix-sync.test.ts` (6 cases, existing)** —
byte-for-byte parity between the canonical wrapper and the
`scripts/nemoclaw-start.sh` heredoc still enforced.

## What this PR no longer carries

This branch previously included a Dockerfile change that added
`'request': {'allowPrivateNetwork': True}` to the inference provider in
the baked `openclaw.json`, and a regression test for it. Empirical
investigation showed:

- openclaw 2026.4.9's strict zod schema rejects
`request.allowPrivateNetwork` as an unrecognized key
(`src/config/zod-schema.core.ts:283-292` in v2026.4.9).
- `openclaw doctor --fix` — which the Dockerfile runs at image build
time — silently strips the key, leaving `"request": {}` before the image
ships.
- The plumbing that *reads* `request.allowPrivateNetwork` only landed in
2026.4.10 (upstream PR #63671, commit `0808dd111c`).

So the previous "fix" was a no-op diagnosing the wrong layer. Dockerfile
and `test/sandbox-provisioning.test.ts` are restored to `origin/main`.

## What this PR keeps from the prior iteration

The label-rename + arithmetic-via-`--json` test improvements added
earlier on this branch are kept — they catch regressions independent of
the proxy bug:

- `test/e2e/test-full-e2e.sh`: Phase 4b relabelled `[LIVE]` →
`[ROUTING]`; new Phase 4c that runs `openclaw agent --json` over SSH
inside the sandbox and asserts the model answered `42` to "What is 6
multiplied by 7?". Phase 4c is the only assertion in the suite that
exercises the openclaw HTTP client end-to-end against `inference.local`.
- `test/e2e/test-hermes-e2e.sh`: same `[LIVE]` → `[ROUTING]` relabel.
- `test/e2e/test-sandbox-operations.sh` TC-SBX-02: replaces `Say
exactly: HELLO_E2E` + merged-output grep with the
arithmetic-via-`--json` pattern.
-
`test/e2e/e2e-cloud-experimental/features/skill/verify-sandbox-skill-via-agent.sh`:
stops embedding `${VERIFY_TOKEN}` in the prompt; refuses overrides that
smuggle it back; adds a negative assertion on `SsrFBlockedError` /
transport / gateway-unavailable markers.

## Type of Change

- [x] Code change (feature, bug fix, or refactor)

## Verification

- [x] `npx vitest run test/http-proxy-fix-e2e.test.ts
test/http-proxy-fix-rewrite.test.ts test/http-proxy-fix-sync.test.ts
test/sandbox-provisioning.test.ts` — 28/28 pass.
- [x] Adversarial review's blockers (B1–B3) and concerns (C1–C7)
addressed in commit `69115de7`.
- [x] `bash -n` and `shellcheck -S warning` clean on
`scripts/nemoclaw-start.sh` and the four touched e2e scripts.
- [x] Heredoc-sync test still passes; canonical wrapper and
`scripts/nemoclaw-start.sh` heredoc updated together.
- [x] No secrets, API keys, or credentials committed.

## Per-commit on this branch

- `f162daa5` — initial strip of `agent`/`auth`/`Host`/`Proxy-*` (the
fix), spy-level unit tests
- `2f217f8a` — end-to-end test against a local HTTPS mock with
self-signed cert
- `69115de7` — wider RFC 7230 §6.1 hop-by-hop + TLS/transport hint
strips, in-repo bisect control, vi.stubEnv consistency, http.request
restore, 7-day cert, CI strict-skip

## How to validate after merge

The signal job is `cloud-e2e` in `nightly-e2e.yaml` running
`test/e2e/test-full-e2e.sh`. Watch for:

```
PASS: [LIVE] openclaw agent: model answered 6×7=42 through openclaw → inference.local
```

For the deepinfra-style failure specifically, the unit + e2e + control
tests in this PR pin every strip the wrapper performs *and* the bug
class it prevents. A real proxy + real upstream + real TLS round trip
would need a CI job with a non-NVIDIA OpenAI-compatible API key — its
own follow-up PR if you want belt-and-suspenders coverage.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **Bug Fixes**
* Improved HTTP proxy request handling to sanitize headers and remove
incompatible routing-specific fields, ensuring proper downstream
forwarding.

* **Tests**
* Added comprehensive test coverage for proxy header sanitization and
rewrite behavior.
* Enhanced E2E testing with improved routing-layer assertions and JSON
payload extraction from agent responses.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
lcsmontiel added a commit to lcsmontiel/NemoClaw that referenced this pull request May 19, 2026
Follow-up to NVIDIA#2344 (the http.request wrapper for NVIDIA#2109). PR NVIDIA#2344
covers axios / follow-redirects / proxy-from-env, all of which flow
through Node's http.request and EnvHttpProxyAgent. This change covers
an additional code path the Microsoft Teams adapter uses: native
fetch() with a custom undici dispatcher.

When OpenClaw's Teams adapter handles a channel @mention it calls the
Microsoft Graph API via:

    fetch('https://graph.microsoft.com/v1.0/teams/.../messages/...',
          { dispatcher: customUndiciAgent })

The custom dispatcher routes the request through its own connection
pool, bypassing the default EnvHttpProxyAgent — which is the only
thing routing HTTPS through the OpenShell L7 proxy. Direct egress to
graph.microsoft.com is blocked by the sandbox network namespace and
surfaces as ECONNREFUSED. Channel @mentions silently fail; DMs work
because the webhook payload carries the message body and no Graph
call is needed.

Wrap globalThis.fetch in the same preload that wraps http.request.
When fetch() is called with a custom dispatcher and an HTTPS URL,
strip the dispatcher and let EnvHttpProxyAgent handle the request.
Non-HTTPS URLs and dispatcher-free calls pass through unchanged so
intra-sandbox HTTP traffic and any non-proxy use of the dispatcher
option keep working.

Refinements over the originally proposed fix:
  - URL extraction handles all three fetch() input forms — string,
    URL object (.href), Request object (.url) — in that order. A
    naive url?.url check would miss URL instances (which have .href,
    not .url) and silently leave the dispatcher attached, defeating
    the fix on callers that pass URL instances.
  - One-shot console.warn so the strip is auditable in logs without
    spamming on every call. The flag is closure-scoped so a process
    restart re-arms it.
  - Defensive typeof origFetch === function guard so a Node runtime
    that ever ships without globalThis.fetch silently no-ops instead
    of throwing at preload time.

After NVIDIA#3109 the preload is delivered as a standalone module copied
from /usr/local/lib/nemoclaw/preloads/ at boot, so this PR only
touches the canonical http-proxy-fix.js — no heredoc to keep in sync.
http-proxy-fix-sync.test.ts already runs the entrypoint block end-to-
end and reads the generated file; extended with explicit assertions
for http.request and globalThis.fetch wrapper presence so a future
deletion of either wrapper trips the test.

Verified end-to-end on a real proxy-enabled sandbox: bot replies to
Teams channel @mentions; DM regression check passes; non-Graph
fetch() and fetch() without a custom dispatcher unaffected.
@wscurran wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

axios requests fail with ERR_BAD_RESPONSE inside NemoClaw sandbox — double proxy conflict with NODE_USE_ENV_PROXY

3 participants