-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[macOS][CLI&UX] nemoclaw inference set crashes on vm-driver sandbox #3725
Copy link
Copy link
Open
Labels
NV QABugs found by the NVIDIA QA TeamBugs found by the NVIDIA QA Teamarea: cliCommand line interface, flags, terminal UX, or outputCommand line interface, flags, terminal UX, or outputarea: inferenceInference routing, serving, model selection, or outputsInference routing, serving, model selection, or outputsarea: sandboxOpenShell sandbox lifecycle, runtime, config, or recoveryOpenShell sandbox lifecycle, runtime, config, or recoveryplatform: macosAffects macOS, including Apple SiliconAffects macOS, including Apple Siliconv0.0.65Release targetRelease target
Metadata
Metadata
Assignees
Labels
NV QABugs found by the NVIDIA QA TeamBugs found by the NVIDIA QA Teamarea: cliCommand line interface, flags, terminal UX, or outputCommand line interface, flags, terminal UX, or outputarea: inferenceInference routing, serving, model selection, or outputsInference routing, serving, model selection, or outputsarea: sandboxOpenShell sandbox lifecycle, runtime, config, or recoveryOpenShell sandbox lifecycle, runtime, config, or recoveryplatform: macosAffects macOS, including Apple SiliconAffects macOS, including Apple Siliconv0.0.65Release targetRelease target
Type
Fields
Give feedbackNo fields configured for Enhancement.
Description
nemoclaw inference setupdates the gateway inference route successfully, but then crashes with an uncaught Node.js exception when it tries to syncopenclaw.jsoninside the sandbox. The crash happens because the code unconditionally runsdocker exec -i openshell-cluster-nemoclaw kubectl exec ...(the docker-driver path) even when the target sandbox lives on the vm driver, where no such container exists.Result: the gateway route is updated but the sandbox-side config is NOT, so the next
nemoclaw <name> connectsilently reverts the model.Environment
Steps to Reproduce
Have an existing sandbox built on the vm driver (status reports
OpenShell: 0.0.39 (vm)).From the host shell, run:
nemoclaw inference set --model nvidia/nemotron-3-super-120b-a12b --provider nvidia-prod --sandbox slack2Expected Result
Either the command completes cleanly for both gateway and sandbox sides, OR it errors out with a clear "vm-driver not supported / use this alternate path" message — no Node.js stack trace.
Actual Result
Gateway side succeeds (
Version: 7,Route: inference.local). Then the CLI throws an uncaught Node.js error:Process exits non-zero. The sandbox's
/sandbox/.openclaw/openclaw.jsonwas NOT updated. The nextnemoclaw <name> connectre-reads the stale model from sandboxopenclaw.jsonand silently overwrites the gateway route we just set.Related
This compounds with #3726 (NVB#6187475), where
nemoclaw <name>with no subcommand auto-runsconnect, which then reverts gateway inference to the stale sandbox-side model — making theinference set"success" effectively a no-op until the user fixes both bugs.NVB#6187474