Skip to content

[Bug] inference.local returns HTTP 403 inside sandbox when using Ollama local inference on DGX Spark #314

@MasahiroShibata

Description

@MasahiroShibata

Title

[Bug] inference.local returns HTTP 403 inside sandbox when using Ollama local inference on DGX Spark

Description

After completing nemoclaw onboard on DGX Spark with Ollama as the local inference provider, all requests to http://inference.local from inside the sandbox return HTTP 403 Forbidden with an empty body. OpenClaw still functions (responds in ~21s), but the inference routing appears to be failing or falling back through a slower path.

Environment

  • Device: NVIDIA DGX Spark (GB10, 128GB unified memory)
  • OS: DGX OS (Ubuntu-based)
  • OpenShell CLI: v0.0.7
  • NemoClaw: installed from source (main branch, cloned 2026-03-17)
  • OpenClaw: 2026.3.11
  • Ollama: running on localhost:11434, listening on 0.0.0.0
  • Model: qwen2.5:32b-instruct-32k (also tested with other models)

Steps to Reproduce

  1. Run nemoclaw onboard on DGX Spark
  2. Select option 3 (Local Ollama) for inference
  3. Complete all onboarding steps (sandbox created successfully)
  4. Switch inference to local model:
    openshell inference set --provider ollama-local --model qwen2.5:32b-instruct-32k
    
  5. Connect to sandbox:
    nemoclaw my-assistant connect
    
  6. Inside sandbox, test inference endpoint:
    curl -v http://inference.local/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{"model":"qwen2.5:32b-instruct-32k","messages":[{"role":"user","content":"hello"}],"stream":false}'
    

Expected Behavior

inference.local should proxy the request to the configured Ollama endpoint and return a valid chat completion response.

Actual Behavior

> POST http://inference.local/v1/chat/completions HTTP/1.1
> Host: inference.local
< HTTP/1.1 403 Forbidden

Empty response body. The request goes through the sandbox proxy at 10.200.0.1:3128 but is denied.

Additional Context

  • Ollama responds correctly from host: curl http://127.0.0.1:11434/api/generate returns a response in ~3 seconds.
  • Ollama responds correctly from inside sandbox via direct host address: curl http://host.openshell.internal:11434/api/generate with stream:true returns the first chunk in ~60ms.
  • However, with stream:false, curl http://host.openshell.internal:11434/v1/chat/completions also returns an empty response.
  • openshell inference get confirms correct configuration:
    Gateway inference:
      Provider: ollama-local
      Model: qwen2.5:32b-instruct-32k
      Version: 2
    
  • OpenClaw TUI does eventually respond (~21 seconds), suggesting it may be falling back to a different inference path or retrying. Direct Ollama latency from host is ~3 seconds.
  • Setting NVIDIA_API_KEY=local-ollama and ANTHROPIC_API_KEY=local-ollama inside the sandbox (per troubleshooting docs) did not resolve the 403 or improve latency.
  • Ollama is configured to listen on all interfaces (OLLAMA_HOST=0.0.0.0).
  • The nemoclaw setup-spark cgroup fix was applied before onboarding.

Possibly Related

Metadata

Metadata

Assignees

Labels

area: local-modelsLocal model providers, downloads, launch, or connectivityarea: providersInference provider integrations and provider behaviorplatform: dgx-sparkAffects DGX Spark hardware or workflows

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions