Description
NemoClaw v0.0.57 non-interactive onboard with NEMOCLAW_PROVIDER=custom completes with "Hermes is ready" but the host-side model-router is never started. Sandbox cannot resolve inference.local, and the bot's first inference call returns HTTP 401 because Hermes falls back to the no-key-required placeholder (hermes_cli/model_switch.py:906) instead of using a real API key.
This appears to be a regression of NVB 6244574 / GitHub Issue #4564 — the host-alias binding fix in commit 44b23d047 does not cover the non-interactive custom-provider path on Linux Docker-driver.
Environment
Device: Internal NVIDIA test host (lynnhu@10.6.76.35 via testmind-dev.nvidia.com)
OS: Ubuntu 24.04.4 LTS (Noble Numbat)
Architecture: x86_64
Node.js: v22.22.3
npm: 10.9.8
Docker: 29.5.2 (build 79eb04c)
OpenShell CLI: 0.0.44
NemoClaw: v0.0.57 (nemohermes shim)
OpenClaw: N/A — running Hermes Agent v0.14.0 (build 2026.5.16, Python 3.13.5)
Steps to Reproduce
-
On a fresh Ubuntu 24.04 host with Docker running, export non-interactive vars and curl|bash:
export NEMOCLAW_INSTALL_TAG=v0.0.57 \
NEMOCLAW_AGENT=hermes \
NEMOCLAW_SANDBOX_NAME=momo \
NEMOCLAW_PROVIDER=custom \
NEMOCLAW_ENDPOINT_URL=https://inference-api.nvidia.com/ \
NEMOCLAW_MODEL=aws/anthropic/bedrock-claude-opus-4-6 \
NEMOCLAW_PROVIDER_KEY=<sk-...> \
COMPATIBLE_API_KEY=<sk-...> \
NEMOCLAW_HERMES_AUTH_METHOD=api_key \
NEMOCLAW_NON_INTERACTIVE=1 \
NEMOCLAW_YES=1 \
NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1
curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
-
Wait for "Hermes is ready" banner.
-
Check host for model-router:
ls ~/.nemoclaw/model-router-venv # does NOT exist
ps -ef | grep model-router # no process
-
docker exec into the sandbox container, test DNS:
curl -sS https://inference.local/v1/models
# curl: (6) Could not resolve host: inference.local
getent hosts inference.local
# (empty)
-
After configuring a Slack channel (or via any Hermes inference path), @-mention the bot:
- Bot replies:
:warning: Non-retryable error (HTTP 401): HTTP 401: LiteLLM Virtual Key expected. Received=no-k_**_ired, expected to start with 'sk-'.
Expected Result
After "Hermes is ready":
~/.nemoclaw/model-router-venv exists on host
- A model-router process is running and listening
- Sandbox can resolve
inference.local (via /etc/hosts injection or Docker --add-host)
- Bot's first @-mention triggers a successful inference call
Actual Result
Onboard reports success but:
~/.nemoclaw/model-router-venv does not exist
- No model-router process on host
- Sandbox
/etc/hosts has no inference.local entry; DNS resolution fails
/sandbox/.hermes/config.yaml has model.base_url=https://inference.local/v1 but no model.api_key
/sandbox/.hermes/.env does NOT contain COMPATIBLE_API_KEY (src/lib/state/sandbox.ts:534 actively strips credential lines from sandbox .env; the host router was supposed to inject auth)
- First @-mention to bot returns HTTP 401
no-key-required placeholder
Code references in v0.0.57 (HEAD 325ed77bb, v0.0.57+4):
Workaround verified: manually edit /sandbox/.hermes/config.yaml — set model.base_url to the real upstream URL (https://inference-api.nvidia.com/v1) and add model.api_key directly; also add OPENAI_API_KEY and COMPATIBLE_API_KEY to /sandbox/.hermes/.env. Then pkill -f 'hermes gateway' and re-run hermes gateway run. After that, @-mentions trigger successful inference (~3s latency, real LLM reply).
Logs
[sandbox-side DNS]
sandbox$ getent hosts inference.local
sandbox$ curl -sS https://inference.local/v1/models
curl: (6) Could not resolve host: inference.local
[sandbox agent.log on @-mention before workaround]
2026-06-03 11:52:25,785 ERROR [20260603_115224_6754050c] root: Non-retryable client error: Error code: 401 - {'error': {'message': "LiteLLM Virtual Key expected. Received=no-k****ired, expected to start with 'sk-'.", 'type': 'auth_error', 'param': 'None', 'code': '401'}}
[host process check]
$ ls ~/.nemoclaw/
onboard-session.json rebuild-backups sandboxes.json source usage-notice.json
$ ps -ef | grep -iE "(model.?router|inference-proxy)" | grep -v grep
(empty)
$ ss -tlnp | grep -E "(127.0.0.1:4000|model-router)"
(empty)
[sandbox agent.log after workaround — for contrast]
2026-06-03 12:07:15,237 INFO gateway.run: inbound message: platform=slack user=Lynn Hu chat=C0B741YC2T0 msg='hi from clean start'
2026-06-03 12:07:23,180 INFO run_agent: API call #1: model=aws/anthropic/bedrock-claude-opus-4-6 provider=custom in=16706 out=25 total=16731 latency=3.0s
2026-06-03 12:07:23,590 INFO gateway.run: response ready: platform=slack chat=C0B741YC2T0 time=8.4s api_calls=1 response=68 chars
[related closed bug — context, not duplicate]
NVB 6244574 [Brev][Inference][GitHub Issue #4564] Model Router inference.local returns "inference service unavailable"... — Closed 2026-06-03 02:05 (Bug - Fixed, QA - Closed - Verified). Same area, reopens on a different platform / non-interactive path.
NVB#6263823
Description
NemoClaw v0.0.57 non-interactive onboard with
NEMOCLAW_PROVIDER=customcompletes with "Hermes is ready" but the host-side model-router is never started. Sandbox cannot resolveinference.local, and the bot's first inference call returns HTTP 401 because Hermes falls back to theno-key-requiredplaceholder (hermes_cli/model_switch.py:906) instead of using a real API key.This appears to be a regression of NVB 6244574 / GitHub Issue #4564 — the host-alias binding fix in commit
44b23d047does not cover the non-interactive custom-provider path on Linux Docker-driver.Environment
Steps to Reproduce
On a fresh Ubuntu 24.04 host with Docker running, export non-interactive vars and
curl|bash:Wait for "Hermes is ready" banner.
Check host for model-router:
docker execinto the sandbox container, test DNS:After configuring a Slack channel (or via any Hermes inference path), @-mention the bot:
:warning: Non-retryable error (HTTP 401): HTTP 401: LiteLLM Virtual Key expected. Received=no-k_**_ired, expected to start with 'sk-'.Expected Result
After "Hermes is ready":
~/.nemoclaw/model-router-venvexists on hostinference.local(via/etc/hostsinjection or Docker--add-host)Actual Result
Onboard reports success but:
~/.nemoclaw/model-router-venvdoes not exist/etc/hostshas noinference.localentry; DNS resolution fails/sandbox/.hermes/config.yamlhasmodel.base_url=https://inference.local/v1but nomodel.api_key/sandbox/.hermes/.envdoes NOT containCOMPATIBLE_API_KEY(src/lib/state/sandbox.ts:534actively strips credential lines from sandbox.env; the host router was supposed to inject auth)no-key-requiredplaceholderCode references in v0.0.57 (HEAD
325ed77bb, v0.0.57+4):src/lib/onboard.ts:5298— compatible-endpoint branch does not callstartModelRoutersrc/lib/onboard/model-router.ts:494—startModelRouteris only invoked viareconcileModelRouter(gated byisRoutedInferenceProvider)src/lib/verify-deployment.ts:223— verifier explicitly checksinference.localresolution; not enforced during non-interactive onboard44b23d047"fix(onboard): reuse containerized gateway and repair routed provider reachability ([Ubuntu 22.04][Onboard] Multi-sandbox onboard's gateway drift detection falsely judges containerized-compat gateway as stale, recreates and collides on port 8080 #4520, [Brev][Inference] Model Router inference.local returns "inference service unavailable" in sandbox on Linux Docker-driver — localhost:4000 unreachable from container #4564)" — closing fix for NVB 6244574; does not cover non-interactive custom-provider path on Ubuntu Docker-driverWorkaround verified: manually edit
/sandbox/.hermes/config.yaml— setmodel.base_urlto the real upstream URL (https://inference-api.nvidia.com/v1) and addmodel.api_keydirectly; also addOPENAI_API_KEYandCOMPATIBLE_API_KEYto/sandbox/.hermes/.env. Thenpkill -f 'hermes gateway'and re-runhermes gateway run. After that, @-mentions trigger successful inference (~3s latency, real LLM reply).Logs
NVB#6263823