[Brev][Inference] Model Router inference.local returns "inference service unavailable" in sandbox on Linux Docker-driver — localhost:4000 unreachable from container

## Description

On Brev (Linux, native Docker), selecting Model Router during NemoClaw onboard results in every inference request returning "inference service unavailable" from inside the sandbox. The gateway registers the Model Router provider with base URL `http://localhost:4000/v1`; the openshell-sandbox proxy inside the container resolves `localhost` to the container's own loopback instead of the host, so the request never reaches the Model Router process running on the host at port 4000. UFW on Brev also has no rule allowing port 4000 from the Docker bridge. This works on local Mac (Colima) where the container-to-host routing is handled differently and UFW is absent.

## Environment

```text
Device:        Brev cloud instance (brev-bkcdc81o3, 2 vCPU / 7.8 GiB RAM)
OS:            Ubuntu 24.04.4 LTS (x86_64, kernel 6.11.0-1016-nvidia)
Architecture:  x86_64
Node.js:       v22.22.3
npm:           10.9.8
Docker:        29.5.2 (native, not Colima)
OpenShell CLI: 0.0.44
NemoClaw:      v0.0.55
OpenClaw:      2026.5.22
```

## Steps to Reproduce

1. On a Brev Linux instance, run:
   ```bash
   curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
   ```
2. At inference provider selection, choose option 8 (Model Router — experimental).
3. Enter a valid NVIDIA API key (`nvapi-...`) when prompted.
4. Complete onboard — sandbox builds successfully and reaches Ready state.
5. Run: `nemoclaw <sandbox> connect`
6. Observe: "inference.local is unavailable inside '`<sandbox>`'"
7. Run: `nemoclaw <sandbox> doctor`
   - `[fail] Docker container: openshell-cluster-nemoclaw not found`
   - `WARNING: Could not find gateway container for 'nemoclaw'. DNS proxy not installed.`

## Expected Result

Model Router routes inference requests from the sandbox to NVIDIA Endpoints, same as on local Mac.

## Actual Result

```text
nemoclaw <sandbox> connect:
  inference.local is unavailable inside '<sandbox>'. Repairing sandbox DNS proxy...
  WARNING: Could not find gateway container for 'nemoclaw'. DNS proxy not installed.
  Warning: failed to repair sandbox DNS proxy.
  Resetting inference route to nvidia-router/nvidia-routed.
  Error: inference.local is still unavailable inside '<sandbox>' after DNS and route repair.
  Last probe: BROKEN 000
  Connect is stopping because the sandbox inference route is known to be broken.
```

Root cause confirmed via diagnostics:
- Model Router process: healthy on host at `0.0.0.0:4000` (`GET /v1/models` returns model list)
- Gateway: healthy at `172.18.0.1:8080`
- `host.openshell.internal` resolves to `172.18.0.1` inside container (correct)
- Container → `172.18.0.1:4000`: **FAILED** (UFW has no rule for port 4000 from `172.18.0.0/16`)
- Provider `nvidia-router` registered with `OPENAI_BASE_URL` pointing to `localhost:4000` (host perspective); container proxy receives this URL and tries `localhost:4000` on its own loopback → fail

## Fix Direction

1. When registering the Model Router provider on Linux Docker-driver mode, use `http://host.openshell.internal:4000/v1` (not `http://localhost:4000/v1`) so the container proxy resolves to the host.
2. During onboard, add UFW rule for port 4000 from Docker bridge subnets alongside the existing port 8080 rule.

## Related

- NVB#6158321: [Brev] Model Router inference broken — HTTP 503 (Closed/Fixed 2026-05-27, v0.0.54). Reproduces on v0.0.55 with a different failure path — possible incomplete fix or regression.
- NVB#6187310: [DGX Spark] Model Router nvapi-* key rejected by LiteLLM (different root cause, same pattern: Model Router non-functional on Linux remote machines).

## Logs

```text
# Host: Model Router healthy
$ curl http://127.0.0.1:4000/health
{"healthy_endpoints":[...nvidia/nemotron-3-nano...nvidia/nemotron-3-super...],"unhealthy_count":0}

# Host: UFW status
$ sudo ufw status
Status: active
8080/tcp   ALLOW   172.18.0.0/16    <- port 8080 open for Docker bridge
(no rule for port 4000)

# Container: inference.local via proxy
$ docker exec <sandbox> curl --proxy http://10.200.0.1:3128 https://inference.local/v1/models
{"error":"inference service unavailable"}

# Container: direct connection to 172.18.0.1:4000
$ docker exec <sandbox> curl http://172.18.0.1:4000/v1/models
(empty -- connection refused / timed out due to UFW)

# nemoclaw doctor
[fail] Docker container: openshell-cluster-nemoclaw not found or not inspectable
[ok]   OpenShell status: connected to nemoclaw
[ok]   Live sandbox: <sandbox> present (Ready)
[ok]   Route: nvidia-router / nvidia-routed
```

---
[NVB#6244574](https://nvbugspro.nvidia.com/bug/6244574)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Brev][Inference] Model Router inference.local returns "inference service unavailable" in sandbox on Linux Docker-driver — localhost:4000 unreachable from container #4564

Description

Environment

Steps to Reproduce

Expected Result

Actual Result

Fix Direction

Related

Logs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Brev][Inference] Model Router inference.local returns "inference service unavailable" in sandbox on Linux Docker-driver — localhost:4000 unreachable from container #4564

Description

Description

Environment

Steps to Reproduce

Expected Result

Actual Result

Fix Direction

Related

Logs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions