Description
Description
On DGX Spark with NemoClaw v0.0.43 + OpenShell 0.0.39, the express setup completes successfully (installs Ollama, pulls qwen3.6:35b, creates sandbox) but inference from the sandbox fails with "LLM request failed: network connection error". Ollama works correctly on the host (direct curl returns valid response), but the sandbox cannot reach it.
Root cause analysis identified three layered issues:
1. DNS: host.openshell.internal does not resolve inside the sandbox
- getent hosts host.openshell.internal → CANNOT RESOLVE
- This hostname is used by the inference route to reach the host Ollama proxy
- Likely caused by OpenShell 0.0.39 Docker-driver gateway not setting up host-gateway DNS (k3s gateway in 0.0.36 used CoreDNS + NodeHosts which worked)
2. Policy port mismatch: local_inference preset allows port 11434, but auth proxy listens on 11435
- Policy: host.openshell.internal:11434 (allowed)
- Actual proxy: 0.0.0.0:11435 (not in policy)
- Even if DNS resolved, requests to :11435 would be blocked by policy
3. SSRF check: OpenShell proxy rejects requests because DNS resolution fails
- Sandbox curl goes through http_proxy=10.200.0.1:3128
- Proxy cannot resolve host.openshell.internal → returns 403 ssrf_denied
This broke between v0.0.38 (OpenShell 0.0.36, k3s gateway, inference worked) and v0.0.43 (OpenShell 0.0.39, Docker-driver gateway, inference broken).
Environment
Device: DGX Spark (spark-dadc / dgx-spark-cr03, 10.173.104.110)
OS: DGX Spark FastOS 1.135.33 (customer build)
Architecture: aarch64
NemoClaw: v0.0.43
OpenShell CLI: openshell 0.0.39
Ollama: qwen3.6:35b (23 GB, 100% GPU, responds correctly on host)
Docker bridge: 172.17.0.0/16, gateway 172.17.0.1
Steps to Reproduce
1. Fresh DGX Spark with FastOS 1.135.33
2. Run: curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
3. Select "Express install" when prompted
4. Wait for Ollama install, model pull, sandbox creation to complete
5. Test inference:
openshell sandbox exec --name my-assistant -- openclaw agent --session-id test -m "hello"
6. Test from sandbox:
openshell sandbox exec --name my-assistant -- getent hosts host.openshell.internal
openshell sandbox exec --name my-assistant -- curl -v http://host.openshell.internal:11435/api/tags
Expected Result
1. host.openshell.internal resolves to Docker bridge gateway (172.17.0.1)
2. Sandbox can reach Ollama auth proxy on port 11435
3. Agent inference returns a valid response via local Ollama
Actual Result
1. host.openshell.internal → CANNOT RESOLVE (getent returns nothing)
2. curl to host.openshell.internal:11435 → 403 ssrf_denied (allowed_ips check failed)
3. Agent inference → "FailoverError: LLM request failed: network connection error"
Direct host test (bypassing sandbox) works fine:
curl http://localhost:11434/api/generate -d '{"model":"qwen3.6:35b","prompt":"hi"}' → valid response
Policy shows local_inference preset with port 11434 only:
local_inference:
endpoints:
- host: host.openshell.internal
port: 11434 ← should include 11435 (auth proxy port)
allowed_ips: 10.0.0.0/8, 172.16.0.0/12
Logs
Gateway agent failed; falling back to embedded: GatewayClientRequestError:
FailoverError: LLM request failed: network connection error.
[agent/embedded] embedded run agent end: runId=test-debug isError=true
model=qwen3.6:35b provider=inference error=LLM request failed:
network connection error. rawError=Connection error.
[model-fallback/decision] model fallback decision: decision=candidate_failed
requested=inference/qwen3.6:35b reason=timeout next=none
detail=Connection error.
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
DGX_Spark_OTA_Computex, NemoClaw, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Inference, NemoClaw_Install, NemoClaw-SWQA-RelBlckr-Recommended |
[NVB#6179603]
Description
Description
Environment Steps to Reproduce Expected Result Actual Result1. host.openshell.internal → CANNOT RESOLVE (getent returns nothing) 2. curl to host.openshell.internal:11435 → 403 ssrf_denied (allowed_ips check failed) 3. Agent inference → "FailoverError: LLM request failed: network connection error" Direct host test (bypassing sandbox) works fine: curl http://localhost:11434/api/generate -d '{"model":"qwen3.6:35b","prompt":"hi"}' → valid response Policy shows local_inference preset with port 11434 only: local_inference: endpoints: - host: host.openshell.internal port: 11434 ← should include 11435 (auth proxy port) allowed_ips: 10.0.0.0/8, 172.16.0.0/12LogsBug Details
[NVB#6179603]