[macOS][Policy&Network] inference.local routed through host HTTP_PROXY on Colima — large model inference fails with 120s timeout

## Description

On macOS with Colima, when a host-level HTTP proxy is active (`HTTP_PROXY=http://127.0.0.1:8118`), NemoClaw does not add `inference.local` to `NO_PROXY` during onboard. All inference requests are routed through the host proxy, which does not support long-lived streaming connections. Every chat message to a large model (Ultra 550B, Super 120B) times out after exactly 120s with "LLM idle timeout (120s): no response from model" and "Broken pipe (os error 32)". Ubuntu bare-metal is unaffected because no host proxy intercepts `inference.local` traffic.

## Environment

```text
Device:        MacBook (arm64, Colima Docker runtime)
OS:            macOS 15.x (Darwin 25.1.0, arm64)
Architecture:  arm64
Node.js:       not captured
npm:           not captured
Docker:        Colima
OpenShell CLI: 0.0.44
NemoClaw:      v0.0.59
OpenClaw:      2026.5.27 (27ae826)
Host proxy:    HTTP_PROXY=http://127.0.0.1:8118 (detected and warned during onboard)
```

## Steps to Reproduce

1. On macOS with Colima and a host HTTP proxy active (`HTTP_PROXY=http://127.0.0.1:8118`)
2. Install NemoClaw v0.0.59
3. `nemoclaw onboard` — select NVIDIA Endpoints, model `nvidia/nemotron-3-ultra-550b-a55b` (onboard warns: "HTTP_PROXY detected" but does not add `inference.local` to `NO_PROXY`)
4. `nemoclaw my-assistant connect && openclaw tui`
5. Send any chat message (e.g. "hello")
6. Wait — observe timeout after 120s

## Expected Result

`inference.local` is a NemoClaw-managed virtual hostname that should resolve locally within the OpenShell network stack, not via the host proxy. Onboard should add `inference.local` (and `*.local`) to `NO_PROXY` so proxy bypass is automatic.

Model responds normally. (Confirmed: Ultra 550B TTFB = 31s on Ubuntu bare-metal, well within the 120s timeout when no proxy is involved.)

## Actual Result

Gateway log (repeated for every chat message):

```text
fetch timeout after 120000ms (elapsed 119350ms) operation=fetchWithSsrFGuard
  url=https://inference.local/v1/chat/completions
[provider-transport-fetch] error provider=inference api=openai-completions
  model=nvidia/nemotron-3-ultra-550b-a55b elapsedMs=119364 name=TimeoutError
Embedded agent failed before reply: LLM idle timeout (120s): no response from model
NET:FAIL [LOW] [msg:Proxy connection error: Broken pipe (os error 32)]
```

Pattern repeats on every message. Session is completely unusable.

## Logs

```text
Gateway log excerpts (openclaw gateway-persistent.log):
  2026-06-05T09:37:52.815+00:00 [fetch-timeout] fetch timeout after 120000ms (elapsed 119350ms)
    operation=fetchWithSsrFGuard url=https://inference.local/v1/chat/completions
  2026-06-05T09:37:52.826+00:00 [provider-transport-fetch] [model-fetch] error
    provider=inference api=openai-completions model=nvidia/nemotron-3-ultra-550b-a55b
    elapsedMs=119364 name=TimeoutError
  2026-06-05T09:37:52.998+00:00 Embedded agent failed before reply:
    LLM idle timeout (120s): no response from model
  [1780652455.685] NET:FAIL [LOW] [msg:Proxy connection error: Broken pipe (os error 32)]

Onboard warning:
  HTTP_PROXY=http://127.0.0.1:8118 detected on host

Comparison (Ubuntu bare-metal, same API key, same model):
  curl TTFB for nvidia/nemotron-3-ultra-550b-a55b = 31s → HTTP 200 (no proxy)
  curl TTFB for nvidia/nemotron-3-nano-omni-30b-a3b-reasoning = 300ms → HTTP 200
```

---
[NVB#6272789](https://nvbugspro.nvidia.com/bug/6272789)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[macOS][Policy&Network] inference.local routed through host HTTP_PROXY on Colima — large model inference fails with 120s timeout #4846

Description

Environment

Steps to Reproduce

Expected Result

Actual Result

Logs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[macOS][Policy&Network] inference.local routed through host HTTP_PROXY on Colima — large model inference fails with 120s timeout #4846

Description

Description

Environment

Steps to Reproduce

Expected Result

Actual Result

Logs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions