Skip to content

[macOS][CLI&UX] nemoclaw inference set crashes on vm-driver sandbox #3725

@hulynn

Description

@hulynn

Description

nemoclaw inference set updates the gateway inference route successfully, but then crashes with an uncaught Node.js exception when it tries to sync openclaw.json inside the sandbox. The crash happens because the code unconditionally runs docker exec -i openshell-cluster-nemoclaw kubectl exec ... (the docker-driver path) even when the target sandbox lives on the vm driver, where no such container exists.

Result: the gateway route is updated but the sandbox-side config is NOT, so the next nemoclaw <name> connect silently reverts the model.

Environment

Device:        MacBook (M4, Apple Silicon)
OS:            macOS 26.1 (Darwin 25.1.0)
Architecture:  arm64
Node.js:       v23.10.0
npm:           11.3.0
Docker:        27.4.0 (Colima context)
OpenShell CLI: 0.0.39
NemoClaw:      v0.0.44
OpenClaw:      2026.4.24 (cbcfdf6)
Sandbox driver: vm  (nemoclaw <name> status shows "OpenShell: 0.0.39 (vm)")

Steps to Reproduce

  1. Have an existing sandbox built on the vm driver (status reports OpenShell: 0.0.39 (vm)).

  2. From the host shell, run:

    nemoclaw inference set --model nvidia/nemotron-3-super-120b-a12b --provider nvidia-prod --sandbox slack2

Expected Result

Either the command completes cleanly for both gateway and sandbox sides, OR it errors out with a clear "vm-driver not supported / use this alternate path" message — no Node.js stack trace.

Actual Result

Gateway side succeeds (Version: 7, Route: inference.local). Then the CLI throws an uncaught Node.js error:

Syncing OpenClaw model identity in sandbox 'slack2'...
node:internal/errors:983
  const err = new Error(message);
              ^
Error: Command failed: docker exec -i openshell-cluster-nemoclaw kubectl exec -n openshell slack2 -c agent -i -- sh -c cat > '/sandbox/.openclaw/openclaw.json'
Error response from daemon: No such container: openshell-cluster-nemoclaw
    at genericNodeError (node:internal/errors:983:15)
    at dockerExecFileSync (/Users/lynnh/.nemoclaw/source/dist/lib/adapters/docker/exec.js:10:57)
    at privilegedSandboxExec (/Users/lynnh/.nemoclaw/source/dist/lib/sandbox/config.js:128:12)
    at Object.writeSandboxConfig (/Users/lynnh/.nemoclaw/source/dist/lib/sandbox/config.js:371:9)
    at runInferenceSet (/Users/lynnh/.nemoclaw/source/dist/lib/actions/inference-set.js:280:10)

Process exits non-zero. The sandbox's /sandbox/.openclaw/openclaw.json was NOT updated. The next nemoclaw <name> connect re-reads the stale model from sandbox openclaw.json and silently overwrites the gateway route we just set.

Related

This compounds with #3726 (NVB#6187475), where nemoclaw <name> with no subcommand auto-runs connect, which then reverts gateway inference to the stale sandbox-side model — making the inference set "success" effectively a no-op until the user fixes both bugs.


NVB#6187474

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA Teamarea: cliCommand line interface, flags, terminal UX, or outputarea: inferenceInference routing, serving, model selection, or outputsarea: sandboxOpenShell sandbox lifecycle, runtime, config, or recoveryplatform: macosAffects macOS, including Apple Siliconv0.0.65Release target
No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions