Description
On macOS with Colima, when a host-level HTTP proxy is active (HTTP_PROXY=http://127.0.0.1:8118), NemoClaw does not add inference.local to NO_PROXY during onboard. All inference requests are routed through the host proxy, which does not support long-lived streaming connections. Every chat message to a large model (Ultra 550B, Super 120B) times out after exactly 120s with "LLM idle timeout (120s): no response from model" and "Broken pipe (os error 32)". Ubuntu bare-metal is unaffected because no host proxy intercepts inference.local traffic.
Environment
Device: MacBook (arm64, Colima Docker runtime)
OS: macOS 15.x (Darwin 25.1.0, arm64)
Architecture: arm64
Node.js: not captured
npm: not captured
Docker: Colima
OpenShell CLI: 0.0.44
NemoClaw: v0.0.59
OpenClaw: 2026.5.27 (27ae826)
Host proxy: HTTP_PROXY=http://127.0.0.1:8118 (detected and warned during onboard)
Steps to Reproduce
- On macOS with Colima and a host HTTP proxy active (
HTTP_PROXY=http://127.0.0.1:8118)
- Install NemoClaw v0.0.59
nemoclaw onboard — select NVIDIA Endpoints, model nvidia/nemotron-3-ultra-550b-a55b (onboard warns: "HTTP_PROXY detected" but does not add inference.local to NO_PROXY)
nemoclaw my-assistant connect && openclaw tui
- Send any chat message (e.g. "hello")
- Wait — observe timeout after 120s
Expected Result
inference.local is a NemoClaw-managed virtual hostname that should resolve locally within the OpenShell network stack, not via the host proxy. Onboard should add inference.local (and *.local) to NO_PROXY so proxy bypass is automatic.
Model responds normally. (Confirmed: Ultra 550B TTFB = 31s on Ubuntu bare-metal, well within the 120s timeout when no proxy is involved.)
Actual Result
Gateway log (repeated for every chat message):
fetch timeout after 120000ms (elapsed 119350ms) operation=fetchWithSsrFGuard
url=https://inference.local/v1/chat/completions
[provider-transport-fetch] error provider=inference api=openai-completions
model=nvidia/nemotron-3-ultra-550b-a55b elapsedMs=119364 name=TimeoutError
Embedded agent failed before reply: LLM idle timeout (120s): no response from model
NET:FAIL [LOW] [msg:Proxy connection error: Broken pipe (os error 32)]
Pattern repeats on every message. Session is completely unusable.
Logs
Gateway log excerpts (openclaw gateway-persistent.log):
2026-06-05T09:37:52.815+00:00 [fetch-timeout] fetch timeout after 120000ms (elapsed 119350ms)
operation=fetchWithSsrFGuard url=https://inference.local/v1/chat/completions
2026-06-05T09:37:52.826+00:00 [provider-transport-fetch] [model-fetch] error
provider=inference api=openai-completions model=nvidia/nemotron-3-ultra-550b-a55b
elapsedMs=119364 name=TimeoutError
2026-06-05T09:37:52.998+00:00 Embedded agent failed before reply:
LLM idle timeout (120s): no response from model
[1780652455.685] NET:FAIL [LOW] [msg:Proxy connection error: Broken pipe (os error 32)]
Onboard warning:
HTTP_PROXY=http://127.0.0.1:8118 detected on host
Comparison (Ubuntu bare-metal, same API key, same model):
curl TTFB for nvidia/nemotron-3-ultra-550b-a55b = 31s → HTTP 200 (no proxy)
curl TTFB for nvidia/nemotron-3-nano-omni-30b-a3b-reasoning = 300ms → HTTP 200
NVB#6272789
Description
On macOS with Colima, when a host-level HTTP proxy is active (
HTTP_PROXY=http://127.0.0.1:8118), NemoClaw does not addinference.localtoNO_PROXYduring onboard. All inference requests are routed through the host proxy, which does not support long-lived streaming connections. Every chat message to a large model (Ultra 550B, Super 120B) times out after exactly 120s with "LLM idle timeout (120s): no response from model" and "Broken pipe (os error 32)". Ubuntu bare-metal is unaffected because no host proxy interceptsinference.localtraffic.Environment
Steps to Reproduce
HTTP_PROXY=http://127.0.0.1:8118)nemoclaw onboard— select NVIDIA Endpoints, modelnvidia/nemotron-3-ultra-550b-a55b(onboard warns: "HTTP_PROXY detected" but does not addinference.localtoNO_PROXY)nemoclaw my-assistant connect && openclaw tuiExpected Result
inference.localis a NemoClaw-managed virtual hostname that should resolve locally within the OpenShell network stack, not via the host proxy. Onboard should addinference.local(and*.local) toNO_PROXYso proxy bypass is automatic.Model responds normally. (Confirmed: Ultra 550B TTFB = 31s on Ubuntu bare-metal, well within the 120s timeout when no proxy is involved.)
Actual Result
Gateway log (repeated for every chat message):
Pattern repeats on every message. Session is completely unusable.
Logs
NVB#6272789