[Bug] inference.local returns HTTP 403 inside sandbox when using Ollama local inference on DGX Spark

## Title

[Bug] inference.local returns HTTP 403 inside sandbox when using Ollama local inference on DGX Spark

## Description

After completing `nemoclaw onboard` on DGX Spark with Ollama as the local inference provider, all requests to `http://inference.local` from inside the sandbox return `HTTP 403 Forbidden` with an empty body. OpenClaw still functions (responds in ~21s), but the inference routing appears to be failing or falling back through a slower path.

## Environment

- **Device:** NVIDIA DGX Spark (GB10, 128GB unified memory)
- **OS:** DGX OS (Ubuntu-based)
- **OpenShell CLI:** v0.0.7
- **NemoClaw:** installed from source (main branch, cloned 2026-03-17)
- **OpenClaw:** 2026.3.11
- **Ollama:** running on localhost:11434, listening on 0.0.0.0
- **Model:** qwen2.5:32b-instruct-32k (also tested with other models)

## Steps to Reproduce

1. Run `nemoclaw onboard` on DGX Spark
2. Select option 3 (Local Ollama) for inference
3. Complete all onboarding steps (sandbox created successfully)
4. Switch inference to local model:
   ```
   openshell inference set --provider ollama-local --model qwen2.5:32b-instruct-32k
   ```
5. Connect to sandbox:
   ```
   nemoclaw my-assistant connect
   ```
6. Inside sandbox, test inference endpoint:
   ```
   curl -v http://inference.local/v1/chat/completions \
     -H "Content-Type: application/json" \
     -d '{"model":"qwen2.5:32b-instruct-32k","messages":[{"role":"user","content":"hello"}],"stream":false}'
   ```

## Expected Behavior

`inference.local` should proxy the request to the configured Ollama endpoint and return a valid chat completion response.

## Actual Behavior

```
> POST http://inference.local/v1/chat/completions HTTP/1.1
> Host: inference.local
< HTTP/1.1 403 Forbidden
```

Empty response body. The request goes through the sandbox proxy at `10.200.0.1:3128` but is denied.

## Additional Context

- **Ollama responds correctly from host:** `curl http://127.0.0.1:11434/api/generate` returns a response in ~3 seconds.
- **Ollama responds correctly from inside sandbox via direct host address:** `curl http://host.openshell.internal:11434/api/generate` with `stream:true` returns the first chunk in ~60ms.
- **However, with `stream:false`**, `curl http://host.openshell.internal:11434/v1/chat/completions` also returns an empty response.
- **`openshell inference get` confirms correct configuration:**
  ```
  Gateway inference:
    Provider: ollama-local
    Model: qwen2.5:32b-instruct-32k
    Version: 2
  ```
- **OpenClaw TUI does eventually respond** (~21 seconds), suggesting it may be falling back to a different inference path or retrying. Direct Ollama latency from host is ~3 seconds.
- Setting `NVIDIA_API_KEY=local-ollama` and `ANTHROPIC_API_KEY=local-ollama` inside the sandbox (per troubleshooting docs) did not resolve the 403 or improve latency.
- Ollama is configured to listen on all interfaces (`OLLAMA_HOST=0.0.0.0`).
- The `nemoclaw setup-spark` cgroup fix was applied before onboarding.

## Possibly Related

- #260 (macOS) mentions `inference.local -> host gateway mapping` as a suggested fix
- #159 reports `logs` subcommand not recognized (also encountered in this setup)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] inference.local returns HTTP 403 inside sandbox when using Ollama local inference on DGX Spark #314

Title

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Additional Context

Possibly Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] inference.local returns HTTP 403 inside sandbox when using Ollama local inference on DGX Spark #314

Description

Title

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Additional Context

Possibly Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions