Description
Description
With the sandbox container for a default sandbox stopped via docker kill, nemoclaw list still shows a live gateway drift annotation (onboarded: model=…, provider=…) instead of falling back to the onboard-time snapshot only when the gateway is unreachable. In the documented test, step 19 specifies stopping the gateway container (e.g. openshell-cluster-nemoclaw) and expects list to hide the drift line while the gateway is down. In practice, killing an individual sandbox (e.g. openshell-prachi-s-…) does not make the gateway unreachable, and nemoclaw list continues to show the drift line for the default sandbox, which is confusing when following the test as written.
Concretely, after configuring multiple sandboxes and changing the gateway’s live inference route with openshell inference set, then killing a sandbox container rather than the gateway container, nemoclaw list still shows:p... (onboarded: model=moonshotai/kimi-k2.6)
even though the test case says “With the gateway unreachable, list falls back cleanly to the stored onboard-time values — the (onboarded: …) drift line is NOT shown.”
Environment
-
Platform: Linux (e.g. Ubuntu 22.04 / 24.04 / 26.04)
-
GPU: Any
-
Docker: Installed and running (supported NemoClaw/OpenShell runtime)
-
NemoClaw CLI: Installed and working (e.g. via
curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash) -
OpenShell gateway: Running on the host with Docker; gateway container name in docs is
openshell-cluster-nemoclaw. -
Sandboxes:
-
sandbox-a → NVIDIA Cloud API / nvidia/nemotron-3-super-120b-a12b (provider nvidia-prod) -
sandbox-b → OpenAI / gpt-5.4 (provider openai-api) -
sandbox-c → Anthropic / claude-sonnet-4-6 (provider anthropic-prod)
-
All OpenShell helper binaries preserved on PATH (e.g. with
--keep-openshell):
-
which openshell returns a valid path -
which openshell-gateway and which openshell-sandbox also return valid paths.
Steps to Reproduce
Preconditions
-
Ensure NemoClaw CLI is installed and Docker is running.
-
Ensure OpenShell binaries are present on PATH:
which openshell which openshell-gateway which openshell-sandbox -
Onboard three sandboxes with distinct providers and models:
-
sandbox-a: NVIDIA Cloud API / nvidia/nemotron-3-super-120b-a12b (provider nvidia-prod) -
sandbox-b: OpenAI / gpt-5.4 (provider openai-api) -
sandbox-c: Anthropic / claude-sonnet-4-6 (provider anthropic-prod)
so that nemoclaw list reflects these values.
Ensure the OpenShell gateway is running in Docker (e.g. container similar to openshell-cluster-nemoclaw).
Repro steps — Part A (list + SSH indicator)
-
Run:
nemoclaw list -
Verify all three sandboxes are listed.
-
Verify each sandbox shows the correct name.
-
Verify each sandbox shows the correct provider.
-
Verify each sandbox shows the correct model.
-
Verify applied policy presets are shown per sandbox.
-
Verify exactly one sandbox row is marked as default with
*. -
Destroy
sandbox-b (e.g. nemoclaw sandbox-b destroy). -
Run:
nemoclaw list -
Verify
sandbox-b is gone; sandbox-a and sandbox-c remain. -
Verify
sandbox-a and sandbox-c still show the correct provider/model. -
In a separate terminal, run:
nemoclaw sandbox-a connect
and keep that SSH session open. -
In the original terminal, run:
nemoclaw list -
Observe the SSH session indicator for
sandbox-a (●) and confirm it disappears after closing the SSH session and re-running nemoclaw list (if implemented as per docs).
Repro steps — Part B (live gateway inference + drift annotation)
Identify the default sandbox (row marked with *) — call it sandbox-default. Note its onboard-time model and provider from the earlier nemoclaw list output (i.e. from ~/.nemoclaw/sandboxes.json).
On the host (not inside any sandbox), change the OpenShell gateway inference route:openshell inference set \ --provider nvidia-prod \ --model z-ai/glm-5.1
Confirm the live gateway route via:openshell inference get
and note that it now differs from the onboard-time model/provider of sandbox-default.
Run:nemoclaw list
In a separate terminal, list Docker containers and stop a sandbox, not the gateway:docker ps # Example output: # CONTAINER ID IMAGE ... NAMES # 3b802bb39a07 openshell/sandbox-from:1779212437 openshell-prachi-s-ee55cb2f-0136-4e39-afc3-74fd41230b6b # 7dea14913437 openshell/sandbox-from:1779135649 openshell-ollama-b82c5a0e-54a6-4618-bb7f-e2c8f1fc6e7a docker kill $(docker ps -q --filter name=openshell-prachi-s)
(This kills the sandbox container for prachis-s but does not stop the OpenShell gateway container.)
Back in the original terminal, run:nemoclaw list
Expected Result
15–18. With the gateway up and the host-side openshell inference set modifying the live route:
19–20. When the gateway is unreachable (per the original test case, by killing the gateway container, e.g. docker kill $(docker ps -q --filter name=openshell-cluster-nemoclaw)):
-
NemoClaw cannot fetch live gateway state, so
nemoclaw list should fall back entirely to the onboard-time snapshot from ~/.nemoclaw/sandboxes.json. -
The default sandbox row should show only the stored model/provider, and no
(onboarded: …) drift line should be printed. -
nemoclaw list should not crash or emit a stack trace; it should report sandboxes based on stored metadata only.
Actual Result
With only the sandbox container killed, and the OpenShell gateway still running, the user sees:
local-lynnh@2u1g-b650-1386:~/NemoClaw$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3b802bb39a07 openshell/sandbox-from:1779212437 "/opt/openshell/bin/…" 29 minutes ago Up 29 minutes openshell-prachi-s-ee55cb2f-0136-4e39-afc3-74fd41230b6b 7dea14913437 openshell/sandbox-from:1779135649 "/opt/openshell/bin/…" 22 hours ago Up 22 hours openshell-ollama-b82c5a0e-54a6-4618-bb7f-e2c8f1fc6e7a local-lynnh@2u1g-b650-1386:~/NemoClaw$ docker kill $(docker ps -q --filter name=openshell-prachi-s) 3b802bb39a07 local-lynnh@2u1g-b650-1386:~/NemoClaw$ nemoclaw list Sandboxes: ollama agent: openclaw model: qwen2.5:7b provider: ollama-local CPU sandbox policies: none dashboard: http://127.0.0.1:18789/ prachi-s * agent: openclaw model: z-ai/glm-5.1 provider: nvidia-prod CPU sandbox policies: npm, pypi, huggingface, brew, brave (onboarded: model=moonshotai/kimi-k2.6) dashboard: http://127.0.0.1:18790/ * = default sandbox
Key differences / confusion points:
-
The test script’s step 19 suggests “Stop/block the gateway (e.g.
docker kill $(docker ps -q --filter name=openshell-cluster-nemoclaw)), then run nemoclaw list,” but the user instead kills a sandbox container and still sees the drift annotation. -
From the user’s perspective, following the test “by killing a container” seems to satisfy the “gateway unreachable” condition, yet
nemoclaw list continues to show the (onboarded: …) drift line, contrary to the Expected section that says it should disappear while the gateway is down.
In other words:
-
nemoclaw list correctly continues to show drift when the gateway is still alive, but the test script as written can be misinterpreted; killing a sandbox container does not truly test the “gateway unreachable” path. -
This creates a mismatch between the documented expectations (“drift line is NOT shown when gateway unreachable”) and the behavior seen when the user follows the steps using a sandbox container instead of the gateway container.
A fix could be either:
-
Clarify the docs/test to explicitly require killing/restarting the gateway container (not any
openshell- container) when validating step 19; and/or -
Make
nemoclaw list explicitly indicate when it is using gateway state vs offline snapshot state, to reduce confusion when containers are partially stopped.
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
NemoClaw, NEMOCLAW_GH_SYNC_APPROVAL |
[NVB#6192607]
Description
Description
With the sandbox container for a default sandbox stopped via
docker kill,nemoclaw liststill shows a live gateway drift annotation(onboarded: model=…, provider=…)instead of falling back to the onboard-time snapshot only when the gateway is unreachable. In the documented test, step 19 specifies stopping the gateway container (e.g.openshell-cluster-nemoclaw) and expectslistto hide the drift line while the gateway is down. In practice, killing an individual sandbox (e.g.openshell-prachi-s-…) does not make the gateway unreachable, andnemoclaw listcontinues to show the drift line for the default sandbox, which is confusing when following the test as written.Concretely, after configuring multiple sandboxes and changing the gateway’s live inference route with
openshell inference set, then killing a sandbox container rather than the gateway container,nemoclaw liststill shows:p... (onboarded: model=moonshotai/kimi-k2.6)even though the test case says “With the gateway unreachable,
listfalls back cleanly to the stored onboard-time values — the(onboarded: …)drift line is NOT shown.”Environment
curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash)openshell-cluster-nemoclaw.sandbox-a→ NVIDIA Cloud API /nvidia/nemotron-3-super-120b-a12b(providernvidia-prod)sandbox-b→ OpenAI /gpt-5.4(provideropenai-api)sandbox-c→ Anthropic /claude-sonnet-4-6(provideranthropic-prod)--keep-openshell):which openshellreturns a valid pathwhich openshell-gatewayandwhich openshell-sandboxalso return valid paths.Steps to Reproduce
Preconditions
which openshell which openshell-gateway which openshell-sandboxsandbox-a: NVIDIA Cloud API /nvidia/nemotron-3-super-120b-a12b(providernvidia-prod)sandbox-b: OpenAI /gpt-5.4(provideropenai-api)sandbox-c: Anthropic /claude-sonnet-4-6(provideranthropic-prod)so that
nemoclaw listreflects these values.Ensure the OpenShell gateway is running in Docker (e.g. container similar to
openshell-cluster-nemoclaw).Repro steps — Part A (list + SSH indicator)
nemoclaw list*.sandbox-b(e.g.nemoclaw sandbox-b destroy).nemoclaw listsandbox-bis gone;sandbox-aandsandbox-cremain.sandbox-aandsandbox-cstill show the correct provider/model.nemoclaw sandbox-a connectand keep that SSH session open.nemoclaw listsandbox-a(●) and confirm it disappears after closing the SSH session and re-runningnemoclaw list(if implemented as per docs).Repro steps — Part B (live gateway inference + drift annotation)
Identify the default sandbox (row marked with
*) — call itsandbox-default. Note its onboard-timemodelandproviderfrom the earliernemoclaw listoutput (i.e. from~/.nemoclaw/sandboxes.json).On the host (not inside any sandbox), change the OpenShell gateway inference route:
openshell inference set \ --provider nvidia-prod \ --model z-ai/glm-5.1Confirm the live gateway route via:
openshell inference getand note that it now differs from the onboard-time
model/providerofsandbox-default.Run:
nemoclaw listIn a separate terminal, list Docker containers and stop a sandbox, not the gateway:
docker ps # Example output: # CONTAINER ID IMAGE ... NAMES # 3b802bb39a07 openshell/sandbox-from:1779212437 openshell-prachi-s-ee55cb2f-0136-4e39-afc3-74fd41230b6b # 7dea14913437 openshell/sandbox-from:1779135649 openshell-ollama-b82c5a0e-54a6-4618-bb7f-e2c8f1fc6e7a docker kill $(docker ps -q --filter name=openshell-prachi-s)(This kills the sandbox container for
prachis-sbut does not stop the OpenShell gateway container.)Back in the original terminal, run:
nemoclaw listExpected Result
15–18. With the gateway up and the host-side
openshell inference setmodifying the live route:nemoclaw listshows the default sandbox row with the live gatewaymodel/provider(e.g.z-ai/glm-5.1,nvidia-prod) and an indented drift annotation line:(onboarded: model=moonshotai/kimi-k2.6, provider=nvidia-prod)reflecting the difference between the live gateway route and the onboard-time stored config.19–20. When the gateway is unreachable (per the original test case, by killing the gateway container, e.g.
docker kill $(docker ps -q --filter name=openshell-cluster-nemoclaw)):nemoclaw listshould fall back entirely to the onboard-time snapshot from~/.nemoclaw/sandboxes.json.(onboarded: …)drift line should be printed.nemoclaw listshould not crash or emit a stack trace; it should report sandboxes based on stored metadata only.Actual Result
With only the sandbox container killed, and the OpenShell gateway still running, the user sees:
local-lynnh@2u1g-b650-1386:~/NemoClaw$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3b802bb39a07 openshell/sandbox-from:1779212437 "/opt/openshell/bin/…" 29 minutes ago Up 29 minutes openshell-prachi-s-ee55cb2f-0136-4e39-afc3-74fd41230b6b 7dea14913437 openshell/sandbox-from:1779135649 "/opt/openshell/bin/…" 22 hours ago Up 22 hours openshell-ollama-b82c5a0e-54a6-4618-bb7f-e2c8f1fc6e7a local-lynnh@2u1g-b650-1386:~/NemoClaw$ docker kill $(docker ps -q --filter name=openshell-prachi-s) 3b802bb39a07 local-lynnh@2u1g-b650-1386:~/NemoClaw$ nemoclaw list Sandboxes: ollama agent: openclaw model: qwen2.5:7b provider: ollama-local CPU sandbox policies: none dashboard: http://127.0.0.1:18789/ prachi-s * agent: openclaw model: z-ai/glm-5.1 provider: nvidia-prod CPU sandbox policies: npm, pypi, huggingface, brew, brave (onboarded: model=moonshotai/kimi-k2.6) dashboard: http://127.0.0.1:18790/ * = default sandboxKey differences / confusion points:
docker kill $(docker ps -q --filter name=openshell-cluster-nemoclaw)), then runnemoclaw list,” but the user instead kills a sandbox container and still sees the drift annotation.nemoclaw listcontinues to show the(onboarded: …)drift line, contrary to the Expected section that says it should disappear while the gateway is down.In other words:
nemoclaw listcorrectly continues to show drift when the gateway is still alive, but the test script as written can be misinterpreted; killing a sandbox container does not truly test the “gateway unreachable” path.A fix could be either:
openshell-container) when validating step 19; and/ornemoclaw listexplicitly indicate when it is using gateway state vs offline snapshot state, to reduce confusion when containers are partially stopped.Bug Details
[NVB#6192607]