[DGX Spark][Inference] Express setup sandbox cannot reach local Ollama — host.openshell.internal unresolvable + policy port mismatch (11434 vs 11435)

## Description

Description
<pre>On DGX Spark with NemoClaw v0.0.43 + OpenShell 0.0.39, the express setup completes successfully (installs Ollama, pulls qwen3.6:35b, creates sandbox) but inference from the sandbox fails with "LLM request failed: network connection error". Ollama works correctly on the host (direct curl returns valid response), but the sandbox cannot reach it.

Root cause analysis identified three layered issues:

1. DNS: host.openshell.internal does not resolve inside the sandbox
 - getent hosts host.openshell.internal → CANNOT RESOLVE
 - This hostname is used by the inference route to reach the host Ollama proxy
 - Likely caused by OpenShell 0.0.39 Docker-driver gateway not setting up host-gateway DNS (k3s gateway in 0.0.36 used CoreDNS + NodeHosts which worked)

2. Policy port mismatch: local_inference preset allows port 11434, but auth proxy listens on 11435
 - Policy: host.openshell.internal:11434 (allowed)
 - Actual proxy: 0.0.0.0:11435 (not in policy)
 - Even if DNS resolved, requests to :11435 would be blocked by policy

3. SSRF check: OpenShell proxy rejects requests because DNS resolution fails
 - Sandbox curl goes through http_proxy=10.200.0.1:3128
 - Proxy cannot resolve host.openshell.internal → returns 403 ssrf_denied

This broke between v0.0.38 (OpenShell 0.0.36, k3s gateway, inference worked) and v0.0.43 (OpenShell 0.0.39, Docker-driver gateway, inference broken).
</pre>Environment
<pre>Device: DGX Spark (spark-dadc / dgx-spark-cr03, 10.173.104.110)
OS: DGX Spark FastOS 1.135.33 (customer build)
Architecture: aarch64
NemoClaw: v0.0.43
OpenShell CLI: openshell 0.0.39
Ollama: qwen3.6:35b (23 GB, 100% GPU, responds correctly on host)
Docker bridge: 172.17.0.0/16, gateway 172.17.0.1
</pre>Steps to Reproduce
<pre>1. Fresh DGX Spark with FastOS 1.135.33
2. Run: curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
3. Select "Express install" when prompted
4. Wait for Ollama install, model pull, sandbox creation to complete
5. Test inference:
 openshell sandbox exec --name my-assistant -- openclaw agent --session-id test -m "hello"
6. Test from sandbox:
 openshell sandbox exec --name my-assistant -- getent hosts host.openshell.internal
 openshell sandbox exec --name my-assistant -- curl -v http://host.openshell.internal:11435/api/tags
</pre>Expected Result
<pre>1. host.openshell.internal resolves to Docker bridge gateway (172.17.0.1)
2. Sandbox can reach Ollama auth proxy on port 11435
3. Agent inference returns a valid response via local Ollama
</pre>Actual Result
<pre>1. host.openshell.internal → CANNOT RESOLVE (getent returns nothing)
2. curl to host.openshell.internal:11435 → 403 ssrf_denied (allowed_ips check failed)
3. Agent inference → "FailoverError: LLM request failed: network connection error"

Direct host test (bypassing sandbox) works fine:
 curl http://localhost:11434/api/generate -d '{"model":"qwen3.6:35b","prompt":"hi"}' → valid response

Policy shows local_inference preset with port 11434 only:
 local_inference:
 endpoints:
 - host: host.openshell.internal
 port: 11434 ← should include 11435 (auth proxy port)
 allowed_ips: 10.0.0.0/8, 172.16.0.0/12
</pre>Logs
<pre>Gateway agent failed; falling back to embedded: GatewayClientRequestError:
 FailoverError: LLM request failed: network connection error.
[agent/embedded] embedded run agent end: runId=test-debug isError=true
 model=qwen3.6:35b provider=inference error=LLM request failed:
 network connection error. rawError=Connection error.
[model-fallback/decision] model fallback decision: decision=candidate_failed
 requested=inference/qwen3.6:35b reason=timeout next=none
 detail=Connection error.
</pre>

## Bug Details

| Field | Value |
|-------|-------|
| Priority | Unprioritized |
| Action | Dev - Open - To fix |
| Disposition | Open issue |
| Module | Machine Learning - NemoClaw |
| Keyword | DGX_Spark_OTA_Computex, NemoClaw, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Inference, NemoClaw_Install, NemoClaw-SWQA-RelBlckr-Recommended |

---
[NVB#6179603]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DGX Spark][Inference] Express setup sandbox cannot reach local Ollama — host.openshell.internal unresolvable + policy port mismatch (11434 vs 11435) #3562

Description

Bug Details

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Field	Value
Priority	Unprioritized
Action	Dev - Open - To fix
Disposition	Open issue
Module	Machine Learning - NemoClaw
Keyword	DGX_Spark_OTA_Computex, NemoClaw, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Inference, NemoClaw_Install, NemoClaw-SWQA-RelBlckr-Recommended

[DGX Spark][Inference] Express setup sandbox cannot reach local Ollama — host.openshell.internal unresolvable + policy port mismatch (11434 vs 11435) #3562

Description

Description

Bug Details

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions