Description
Description
After completing `nemoclaw onboard` with the new "Model Router (experimental)" inference option, the OpenClaw TUI returns HTTP 503 "inference service unavailable" on every prompt. Onboard reports SUCCESS but the generated `~/.nemoclaw/state/litellm-proxy.yaml` (and the upstream `nemoclaw-blueprint/router/pool-config.yaml` it derives from) contain three independent config errors that together prevent any request from reaching upstream NVIDIA inference. This blocks the entire Provider Routed code path 100%.
Environment
Device: Brev (shadeform brev-pz811qnfg) — H100 PCIe x1
OS: Ubuntu 22.04.5 LTS (kernel 6.8.0-90-generic)
Architecture: x86_64
Node.js: v22.22.2
npm: 10.9.7
Docker: 29.1.3 (build f52814d)
OpenShell CLI: 0.0.36
NemoClaw: v0.0.37
OpenClaw: 2026.4.24 (cbcfdf6)
Steps to Reproduce
1. curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
2. Type 'yes' to accept the license/notice.
3. At [3/8] Configuring inference, choose option 8 (Model Router (experimental)).
4. At "Model Router API key:" prompt, enter a valid NVIDIA API key (nvapi-...).
5. Sandbox name: route. Accept review. Skip messaging. Accept Balanced policy presets.
6. Wait for "Installation complete" — onboard reports all 8 steps SUCCESS.
7. nemoclaw route connect
8. sandbox@route$ openclaw tui
9. Send any prompt (e.g. "ping").
Expected Result
TUI returns a model response routed via the configured Model Router (NVIDIA Nemotron 3 Nano or Super, depending on prefill router decision).
Actual Result
TUI shows two consecutive errors and the gateway disconnects:
HTTP 503: "inference service unavailable"
HTTP 503: "inference service unavailable"
gateway disconnected: closed | idle
agent main | session main (openclaw-tui) | inference/nvidia-routed | tokens ?/131k
Hitting /health on the host model-router shows both upstream endpoints unhealthy:
curl -s http://127.0.0.1:4000/health
-> healthy_count: 0, unhealthy_count: 2
-> error: "Authentication Error, LiteLLM Virtual Key expected. Received=nvap****Nrd-, expected to start with 'sk-'." (HTTP 401)
Root Cause — three independent config errors that compound
1. WRONG UPSTREAM ENDPOINT
pool-config.yaml + litellm-proxy.yaml set api_base = https://inference-api.nvidia.com
That host is itself a LiteLLM proxy that only accepts sk-* virtual keys; it rejects nvapi-* keys
with HTTP 401 "LiteLLM Virtual Key expected".
Verified: curl -H "Authorization: Bearer nvapi-..." https://inference-api.nvidia.com/v1/models -> 401
The endpoint that actually accepts nvapi-* keys is https://integrate.api.nvidia.com/v1
(public NVIDIA Build / NIM gateway). Verified: same key returns 200.
2. WRONG MODEL IDS (case + doubled prefix + non-existent super id)
pool-config.yaml uses:
openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B
openai/nvidia/nvidia/nemotron-3-super-v3
Issues:
(a) "nvidia/nvidia/" prefix is doubled — should be single nvidia/.
(b) Nano id case is wrong — actual catalog id is lowercase nemotron-3-nano-30b-a3b.
(c) "nemotron-3-super-v3" does NOT exist in the NVIDIA catalog at all. The Super model id is
nemotron-3-super-120b-a12b.
Verified by enumerating https://integrate.api.nvidia.com/v1/models — only the lowercase ids resolve.
3. WRONG ENV VAR NAME (host vs gateway disagree, and the env var was never exported)
~/.nemoclaw/state/litellm-proxy.yaml says: api_key: os.environ/OPENAI_API_KEY
~/.nemoclaw/onboard-session.json records: "credentialEnv": "NVIDIA_API_KEY"
`openshell provider get nvidia-router -g nemoclaw` shows: Credential keys: NVIDIA_API_KEY
The host-side LiteLLM and the gateway-side provider config disagree on the env var name.
Additionally, on the live model-router process (PID 236635), inspection of /proc/PID/environ
shows neither OPENAI_API_KEY nor NVIDIA_API_KEY exported — onboard also fails to plumb the
credential into the subprocess env.
Fix Verification
Manually patched ~/.nemoclaw/state/litellm-proxy.yaml:
- model: openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B -> openai/nvidia/nemotron-3-nano-30b-a3b
- model: openai/nvidia/nvidia/nemotron-3-super-v3 -> openai/nvidia/nemotron-3-super-120b-a12b
- api_base: https://inference-api.nvidia.com -> https://integrate.api.nvidia.com/v1
- api_key: os.environ/OPENAI_API_KEY -> (env var was empty)
Restarted model-router. Result:
GET /health -> healthy: 2, unhealthy: 0 (was 0, 2 before)
POST /v1/chat/completions model=nemotron-3-super -> "Two plus two equals four." (correct content)
POST /v1/chat/completions model=nvidia-routed (alias) -> reasoning_content populated (routing works)
POST /v1/chat/completions model=nemotron-3-nano-reasoning -> reasoning_content populated (nano works)
The same fix needs to land upstream in:
- nemoclaw-blueprint/router/pool-config.yaml (model ids + api_base)
- The onboard code that emits litellm-proxy.yaml (env var name should match credentialEnv)
- The onboard code that exports NVIDIA_API_KEY into the model-router subprocess env
Logs
Pre-fix /health excerpt (sanitized):
{"healthy_endpoints":[],"unhealthy_endpoints":[
{"api_base":"https://inference-api.nvidia.com",
"model":"openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B",
"error":"litellm.AuthenticationError: ... Authentication Error, LiteLLM Virtual Key expected. Received=nvap****Nrd-, expected to start with 'sk-'."},
{"api_base":"https://inference-api.nvidia.com",
"model":"openai/nvidia/nvidia/nemotron-3-super-v3",
"error":"... same auth error"}],
"healthy_count":0,"unhealthy_count":2}
Sandbox openshell-router log (repeating ~30s loop until TUI gives up):
[INFO] routing proxy inference request (streaming) endpoint=http://host.openshell.internal:4000/v1 path=/v1/chat/completions
[LOW] NET:FAIL inference.local:443
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
NemoClaw, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Inference, NemoClaw_Onboard, NemoClaw-SWQA-RelBlckr-Recommended |
[NVB#6158321]
Description
Description
Environment Steps to Reproduce Expected Result Actual Result Root Cause — three independent config errors that compound1. WRONG UPSTREAM ENDPOINT pool-config.yaml + litellm-proxy.yaml set api_base = https://inference-api.nvidia.com That host is itself a LiteLLM proxy that only accepts sk-* virtual keys; it rejects nvapi-* keys with HTTP 401 "LiteLLM Virtual Key expected". Verified: curl -H "Authorization: Bearer nvapi-..." https://inference-api.nvidia.com/v1/models -> 401 The endpoint that actually accepts nvapi-* keys is https://integrate.api.nvidia.com/v1 (public NVIDIA Build / NIM gateway). Verified: same key returns 200. 2. WRONG MODEL IDS (case + doubled prefix + non-existent super id) pool-config.yaml uses: openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B openai/nvidia/nvidia/nemotron-3-super-v3 Issues: (a) "nvidia/nvidia/" prefix is doubled — should be single nvidia/. (b) Nano id case is wrong — actual catalog id is lowercase nemotron-3-nano-30b-a3b. (c) "nemotron-3-super-v3" does NOT exist in the NVIDIA catalog at all. The Super model id is nemotron-3-super-120b-a12b. Verified by enumerating https://integrate.api.nvidia.com/v1/models — only the lowercase ids resolve. 3. WRONG ENV VAR NAME (host vs gateway disagree, and the env var was never exported) ~/.nemoclaw/state/litellm-proxy.yaml says: api_key: os.environ/OPENAI_API_KEY ~/.nemoclaw/onboard-session.json records: "credentialEnv": "NVIDIA_API_KEY" `openshell provider get nvidia-router -g nemoclaw` shows: Credential keys: NVIDIA_API_KEY The host-side LiteLLM and the gateway-side provider config disagree on the env var name. Additionally, on the live model-router process (PID 236635), inspection of /proc/PID/environ shows neither OPENAI_API_KEY nor NVIDIA_API_KEY exported — onboard also fails to plumb the credential into the subprocess env.Fix Verification LogsPre-fix /health excerpt (sanitized): {"healthy_endpoints":[],"unhealthy_endpoints":[ {"api_base":"https://inference-api.nvidia.com", "model":"openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B", "error":"litellm.AuthenticationError: ... Authentication Error, LiteLLM Virtual Key expected. Received=nvap****Nrd-, expected to start with 'sk-'."}, {"api_base":"https://inference-api.nvidia.com", "model":"openai/nvidia/nvidia/nemotron-3-super-v3", "error":"... same auth error"}], "healthy_count":0,"unhealthy_count":2} Sandbox openshell-router log (repeating ~30s loop until TUI gives up): [INFO] routing proxy inference request (streaming) endpoint=http://host.openshell.internal:4000/v1 path=/v1/chat/completions [LOW] NET:FAIL inference.local:443Bug Details
[NVB#6158321]