Title
[Bug] inference.local returns HTTP 403 inside sandbox when using Ollama local inference on DGX Spark
Description
After completing nemoclaw onboard on DGX Spark with Ollama as the local inference provider, all requests to http://inference.local from inside the sandbox return HTTP 403 Forbidden with an empty body. OpenClaw still functions (responds in ~21s), but the inference routing appears to be failing or falling back through a slower path.
Environment
- Device: NVIDIA DGX Spark (GB10, 128GB unified memory)
- OS: DGX OS (Ubuntu-based)
- OpenShell CLI: v0.0.7
- NemoClaw: installed from source (main branch, cloned 2026-03-17)
- OpenClaw: 2026.3.11
- Ollama: running on localhost:11434, listening on 0.0.0.0
- Model: qwen2.5:32b-instruct-32k (also tested with other models)
Steps to Reproduce
- Run
nemoclaw onboard on DGX Spark
- Select option 3 (Local Ollama) for inference
- Complete all onboarding steps (sandbox created successfully)
- Switch inference to local model:
openshell inference set --provider ollama-local --model qwen2.5:32b-instruct-32k
- Connect to sandbox:
nemoclaw my-assistant connect
- Inside sandbox, test inference endpoint:
curl -v http://inference.local/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"qwen2.5:32b-instruct-32k","messages":[{"role":"user","content":"hello"}],"stream":false}'
Expected Behavior
inference.local should proxy the request to the configured Ollama endpoint and return a valid chat completion response.
Actual Behavior
> POST http://inference.local/v1/chat/completions HTTP/1.1
> Host: inference.local
< HTTP/1.1 403 Forbidden
Empty response body. The request goes through the sandbox proxy at 10.200.0.1:3128 but is denied.
Additional Context
- Ollama responds correctly from host:
curl http://127.0.0.1:11434/api/generate returns a response in ~3 seconds.
- Ollama responds correctly from inside sandbox via direct host address:
curl http://host.openshell.internal:11434/api/generate with stream:true returns the first chunk in ~60ms.
- However, with
stream:false, curl http://host.openshell.internal:11434/v1/chat/completions also returns an empty response.
openshell inference get confirms correct configuration:
Gateway inference:
Provider: ollama-local
Model: qwen2.5:32b-instruct-32k
Version: 2
- OpenClaw TUI does eventually respond (~21 seconds), suggesting it may be falling back to a different inference path or retrying. Direct Ollama latency from host is ~3 seconds.
- Setting
NVIDIA_API_KEY=local-ollama and ANTHROPIC_API_KEY=local-ollama inside the sandbox (per troubleshooting docs) did not resolve the 403 or improve latency.
- Ollama is configured to listen on all interfaces (
OLLAMA_HOST=0.0.0.0).
- The
nemoclaw setup-spark cgroup fix was applied before onboarding.
Possibly Related
Title
[Bug] inference.local returns HTTP 403 inside sandbox when using Ollama local inference on DGX Spark
Description
After completing
nemoclaw onboardon DGX Spark with Ollama as the local inference provider, all requests tohttp://inference.localfrom inside the sandbox returnHTTP 403 Forbiddenwith an empty body. OpenClaw still functions (responds in ~21s), but the inference routing appears to be failing or falling back through a slower path.Environment
Steps to Reproduce
nemoclaw onboardon DGX SparkExpected Behavior
inference.localshould proxy the request to the configured Ollama endpoint and return a valid chat completion response.Actual Behavior
Empty response body. The request goes through the sandbox proxy at
10.200.0.1:3128but is denied.Additional Context
curl http://127.0.0.1:11434/api/generatereturns a response in ~3 seconds.curl http://host.openshell.internal:11434/api/generatewithstream:truereturns the first chunk in ~60ms.stream:false,curl http://host.openshell.internal:11434/v1/chat/completionsalso returns an empty response.openshell inference getconfirms correct configuration:NVIDIA_API_KEY=local-ollamaandANTHROPIC_API_KEY=local-ollamainside the sandbox (per troubleshooting docs) did not resolve the 403 or improve latency.OLLAMA_HOST=0.0.0.0).nemoclaw setup-sparkcgroup fix was applied before onboarding.Possibly Related
inference.local -> host gateway mappingas a suggested fixlogssubcommand not recognized (also encountered in this setup)