Skip to content

[Brev][Inference] Model Router inference.local returns "inference service unavailable" in sandbox on Linux Docker-driver — localhost:4000 unreachable from container #4564

@hulynn

Description

@hulynn

Description

On Brev (Linux, native Docker), selecting Model Router during NemoClaw onboard results in every inference request returning "inference service unavailable" from inside the sandbox. The gateway registers the Model Router provider with base URL http://localhost:4000/v1; the openshell-sandbox proxy inside the container resolves localhost to the container's own loopback instead of the host, so the request never reaches the Model Router process running on the host at port 4000. UFW on Brev also has no rule allowing port 4000 from the Docker bridge. This works on local Mac (Colima) where the container-to-host routing is handled differently and UFW is absent.

Environment

Device:        Brev cloud instance (brev-bkcdc81o3, 2 vCPU / 7.8 GiB RAM)
OS:            Ubuntu 24.04.4 LTS (x86_64, kernel 6.11.0-1016-nvidia)
Architecture:  x86_64
Node.js:       v22.22.3
npm:           10.9.8
Docker:        29.5.2 (native, not Colima)
OpenShell CLI: 0.0.44
NemoClaw:      v0.0.55
OpenClaw:      2026.5.22

Steps to Reproduce

  1. On a Brev Linux instance, run:
    curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
  2. At inference provider selection, choose option 8 (Model Router — experimental).
  3. Enter a valid NVIDIA API key (nvapi-...) when prompted.
  4. Complete onboard — sandbox builds successfully and reaches Ready state.
  5. Run: nemoclaw <sandbox> connect
  6. Observe: "inference.local is unavailable inside '<sandbox>'"
  7. Run: nemoclaw <sandbox> doctor
    • [fail] Docker container: openshell-cluster-nemoclaw not found
    • WARNING: Could not find gateway container for 'nemoclaw'. DNS proxy not installed.

Expected Result

Model Router routes inference requests from the sandbox to NVIDIA Endpoints, same as on local Mac.

Actual Result

nemoclaw <sandbox> connect:
  inference.local is unavailable inside '<sandbox>'. Repairing sandbox DNS proxy...
  WARNING: Could not find gateway container for 'nemoclaw'. DNS proxy not installed.
  Warning: failed to repair sandbox DNS proxy.
  Resetting inference route to nvidia-router/nvidia-routed.
  Error: inference.local is still unavailable inside '<sandbox>' after DNS and route repair.
  Last probe: BROKEN 000
  Connect is stopping because the sandbox inference route is known to be broken.

Root cause confirmed via diagnostics:

  • Model Router process: healthy on host at 0.0.0.0:4000 (GET /v1/models returns model list)
  • Gateway: healthy at 172.18.0.1:8080
  • host.openshell.internal resolves to 172.18.0.1 inside container (correct)
  • Container → 172.18.0.1:4000: FAILED (UFW has no rule for port 4000 from 172.18.0.0/16)
  • Provider nvidia-router registered with OPENAI_BASE_URL pointing to localhost:4000 (host perspective); container proxy receives this URL and tries localhost:4000 on its own loopback → fail

Fix Direction

  1. When registering the Model Router provider on Linux Docker-driver mode, use http://host.openshell.internal:4000/v1 (not http://localhost:4000/v1) so the container proxy resolves to the host.
  2. During onboard, add UFW rule for port 4000 from Docker bridge subnets alongside the existing port 8080 rule.

Related

  • NVB#6158321: [Brev] Model Router inference broken — HTTP 503 (Closed/Fixed 2026-05-27, v0.0.54). Reproduces on v0.0.55 with a different failure path — possible incomplete fix or regression.
  • NVB#6187310: [DGX Spark] Model Router nvapi-* key rejected by LiteLLM (different root cause, same pattern: Model Router non-functional on Linux remote machines).

Logs

# Host: Model Router healthy
$ curl http://127.0.0.1:4000/health
{"healthy_endpoints":[...nvidia/nemotron-3-nano...nvidia/nemotron-3-super...],"unhealthy_count":0}

# Host: UFW status
$ sudo ufw status
Status: active
8080/tcp   ALLOW   172.18.0.0/16    <- port 8080 open for Docker bridge
(no rule for port 4000)

# Container: inference.local via proxy
$ docker exec <sandbox> curl --proxy http://10.200.0.1:3128 https://inference.local/v1/models
{"error":"inference service unavailable"}

# Container: direct connection to 172.18.0.1:4000
$ docker exec <sandbox> curl http://172.18.0.1:4000/v1/models
(empty -- connection refused / timed out due to UFW)

# nemoclaw doctor
[fail] Docker container: openshell-cluster-nemoclaw not found or not inspectable
[ok]   OpenShell status: connected to nemoclaw
[ok]   Live sandbox: <sandbox> present (Ready)
[ok]   Route: nvidia-router / nvidia-routed

NVB#6244574

Metadata

Metadata

Assignees

Labels

area: inferenceInference routing, serving, model selection, or outputsarea: sandboxOpenShell sandbox lifecycle, runtime, config, or recoveryplatform: brevAffects Brev hosted development environments
No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions