Skip to content

[Brev][Onboard] Model Router (Provider Routed) inference broken — TUI returns HTTP 503 after successful onboard #3255

@hulynn

Description

@hulynn

Description

Description

After completing `nemoclaw onboard` with the new "Model Router (experimental)" inference option, the OpenClaw TUI returns HTTP 503 "inference service unavailable" on every prompt. Onboard reports SUCCESS but the generated `~/.nemoclaw/state/litellm-proxy.yaml` (and the upstream `nemoclaw-blueprint/router/pool-config.yaml` it derives from) contain three independent config errors that together prevent any request from reaching upstream NVIDIA inference. This blocks the entire Provider Routed code path 100%.
Environment
Device:        Brev (shadeform brev-pz811qnfg) — H100 PCIe x1
OS:            Ubuntu 22.04.5 LTS (kernel 6.8.0-90-generic)
Architecture:  x86_64
Node.js:       v22.22.2
npm:           10.9.7
Docker:        29.1.3 (build f52814d)
OpenShell CLI: 0.0.36
NemoClaw:      v0.0.37
OpenClaw:      2026.4.24 (cbcfdf6)
Steps to Reproduce
1. curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
2. Type 'yes' to accept the license/notice.
3. At [3/8] Configuring inference, choose option 8 (Model Router (experimental)).
4. At "Model Router API key:" prompt, enter a valid NVIDIA API key (nvapi-...).
5. Sandbox name: route. Accept review. Skip messaging. Accept Balanced policy presets.
6. Wait for "Installation complete" — onboard reports all 8 steps SUCCESS.
7. nemoclaw route connect
8. sandbox@route$ openclaw tui
9. Send any prompt (e.g. "ping").
Expected Result
TUI returns a model response routed via the configured Model Router (NVIDIA Nemotron 3 Nano or Super, depending on prefill router decision).
Actual Result
TUI shows two consecutive errors and the gateway disconnects:

  HTTP 503: "inference service unavailable"
  HTTP 503: "inference service unavailable"
  gateway disconnected: closed | idle
  agent main | session main (openclaw-tui) | inference/nvidia-routed | tokens ?/131k

Hitting /health on the host model-router shows both upstream endpoints unhealthy:
  curl -s http://127.0.0.1:4000/health
  -> healthy_count: 0, unhealthy_count: 2
  -> error: "Authentication Error, LiteLLM Virtual Key expected. Received=nvap****Nrd-, expected to start with 'sk-'." (HTTP 401)
Root Cause — three independent config errors that compound
1. WRONG UPSTREAM ENDPOINT
   pool-config.yaml + litellm-proxy.yaml set api_base = https://inference-api.nvidia.com
   That host is itself a LiteLLM proxy that only accepts sk-* virtual keys; it rejects nvapi-* keys
   with HTTP 401 "LiteLLM Virtual Key expected".
   Verified: curl -H "Authorization: Bearer nvapi-..." https://inference-api.nvidia.com/v1/models -> 401
   The endpoint that actually accepts nvapi-* keys is https://integrate.api.nvidia.com/v1
   (public NVIDIA Build / NIM gateway). Verified: same key returns 200.

2. WRONG MODEL IDS (case + doubled prefix + non-existent super id)
   pool-config.yaml uses:
     openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B
     openai/nvidia/nvidia/nemotron-3-super-v3
   Issues:
     (a) "nvidia/nvidia/" prefix is doubled — should be single nvidia/.
     (b) Nano id case is wrong — actual catalog id is lowercase nemotron-3-nano-30b-a3b.
     (c) "nemotron-3-super-v3" does NOT exist in the NVIDIA catalog at all. The Super model id is
         nemotron-3-super-120b-a12b.
   Verified by enumerating https://integrate.api.nvidia.com/v1/models — only the lowercase ids resolve.

3. WRONG ENV VAR NAME (host vs gateway disagree, and the env var was never exported)
   ~/.nemoclaw/state/litellm-proxy.yaml says:  api_key: os.environ/OPENAI_API_KEY
   ~/.nemoclaw/onboard-session.json records:    "credentialEnv": "NVIDIA_API_KEY"
   `openshell provider get nvidia-router -g nemoclaw` shows: Credential keys: NVIDIA_API_KEY
   The host-side LiteLLM and the gateway-side provider config disagree on the env var name.
   Additionally, on the live model-router process (PID 236635), inspection of /proc/PID/environ
   shows neither OPENAI_API_KEY nor NVIDIA_API_KEY exported — onboard also fails to plumb the
   credential into the subprocess env.
Fix Verification
Manually patched ~/.nemoclaw/state/litellm-proxy.yaml:
  - model: openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B  ->  openai/nvidia/nemotron-3-nano-30b-a3b
  - model: openai/nvidia/nvidia/nemotron-3-super-v3      ->  openai/nvidia/nemotron-3-super-120b-a12b
  - api_base: https://inference-api.nvidia.com           ->  https://integrate.api.nvidia.com/v1
  - api_key:  os.environ/OPENAI_API_KEY                  ->    (env var was empty)

Restarted model-router. Result:
  GET /health -> healthy: 2, unhealthy: 0  (was 0, 2 before)
  POST /v1/chat/completions  model=nemotron-3-super       -> "Two plus two equals four."  (correct content)
  POST /v1/chat/completions  model=nvidia-routed (alias)  -> reasoning_content populated   (routing works)
  POST /v1/chat/completions  model=nemotron-3-nano-reasoning -> reasoning_content populated (nano works)

The same fix needs to land upstream in:
  - nemoclaw-blueprint/router/pool-config.yaml  (model ids + api_base)
  - The onboard code that emits litellm-proxy.yaml (env var name should match credentialEnv)
  - The onboard code that exports NVIDIA_API_KEY into the model-router subprocess env
Logs
Pre-fix /health excerpt (sanitized):
  {"healthy_endpoints":[],"unhealthy_endpoints":[
    {"api_base":"https://inference-api.nvidia.com",
     "model":"openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B",
     "error":"litellm.AuthenticationError: ... Authentication Error, LiteLLM Virtual Key expected. Received=nvap****Nrd-, expected to start with 'sk-'."},
    {"api_base":"https://inference-api.nvidia.com",
     "model":"openai/nvidia/nvidia/nemotron-3-super-v3",
     "error":"... same auth error"}],
   "healthy_count":0,"unhealthy_count":2}

Sandbox openshell-router log (repeating ~30s loop until TUI gives up):
  [INFO] routing proxy inference request (streaming) endpoint=http://host.openshell.internal:4000/v1 path=/v1/chat/completions
  [LOW] NET:FAIL inference.local:443

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Inference, NemoClaw_Onboard, NemoClaw-SWQA-RelBlckr-Recommended

[NVB#6158321]

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA TeamUATIssues flagged for User Acceptance Testing.area: inferenceInference routing, serving, model selection, or outputsplatform: brevAffects Brev hosted development environments
No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions