Description
[Description]
When running Hermes onboarding via nemoclaw onboard --agent hermes (or nemohermes onboard), the sandbox image build and upload to the OpenShell gateway succeed, but the final sandbox create stream never completes cleanly and the Hermes sandbox never reaches Ready. Instead, the gateway reports “Create stream exited with code 1 after sandbox was created,” then waits for readiness, times out, and deletes the sandbox with the message “Sandbox 'hermes' was created but did not become ready within 180s. The orphaned sandbox has been removed — you can safely retry.” This prevents Hermes Agent from ever becoming usable and blocks the test case that expects Hermes onboarding to complete and the agent to respond to prompts.
[Environment]
Device: Linux Ubuntu 24.04 and DGX Spark (both exhibit the failure)
NemoClaw: v0.0.37 (same CLI used for Hermes onboarding)
OpenShell CLI: 0.0.36 (per nemohermes preflight output)
OpenClaw: 2026.4.24 (implicit from environment; exact minor not central to failure)
Sandbox OS: Linux (Hermes base image ghcr.io/nvidia/nemoclaw/hermes-sandbox-base:latest)
Container runtime: Docker, GPU‑enabled (--gpus all selected; GPU proofs all pass)
Network: outbound HTTPS allowed; inference via NVIDIA Endpoints configured and healthy
[Steps to Reproduce]
Pre‑condition:
-
NemoClaw and NemoHermes CLIs installed (via
curl | bash or npm). -
Docker is running with GPU access.
-
Valid NVIDIA Endpoints API key available.
Steps:
-
Run Hermes onboarding:
bash
nemohermes onboard
(or equivalently nemoclaw onboard --agent hermes on this host).
In the inference step, choose NVIDIA Endpoints (option 1), enter a valid NVIDIA API key, and select a cloud model (e.g. nvidia/nemotron-3-super-120b-a12b).
Accept defaults for web search and messaging, or configure Slack as in the transcript.
When prompted, keep the sandbox name as hermes.
Confirm the configuration (“Apply this configuration? [Y/n]: Y”).
Observe the log as NemoHermes builds the Hermes sandbox image from hermes-sandbox-base, pushes openshell/sandbox-from: into the gateway, and starts sandbox creation.
Continue watching until the “Sandbox 'hermes' was created but did not become ready within 180s” message appears.
[Expected]
-
Hermes onboarding should:
-
Build the Hermes sandbox image successfully.
-
Upload (
Push) the image into the OpenShell gateway. -
Create the
hermes sandbox container and wait for it to reach Ready within the configured timeout. -
Keep the sandbox and report success (no deletion), so that subsequent
nemoclaw connect and hermes commands work.
-
The test case for Hermes expects onboard to complete without error, then allow launching the TUI and sending prompts to Hermes.
[Actual]
-
The Hermes Dockerfile build completes and the image is pushed into the OpenShell gateway:
text
Built image openshell/sandbox-from:1778262149 Uploading image into OpenShell gateway... Pushing image openshell/sandbox-from:1778262149 into gateway "nemoclaw" [progress] Exported ... [progress] Uploaded to gateway Image openshell/sandbox-from:1778262149 is available in the gateway. -
After upload, the gateway reports repeated “Still uploading image into OpenShell gateway...” lines for several minutes (up to 315s), even though the image is already marked as available.
-
The create stream then exits with a non‑zero code:
text
Create stream exited with code 1 after sandbox was created. Checking whether the sandbox reaches Ready state... Waiting for sandbox to become ready... -
Eventually, the orchestrator times out and deletes the sandbox:
text
✓ Deleted sandbox hermes Sandbox 'hermes' was created but did not become ready within 180s. The orphaned sandbox has been removed — you can safely retry. Retry: nemohermes onboard -
There is no successful “Sandbox hermes Ready” message, and Hermes Agent never becomes reachable; the user is left in a loop of “retry nemohermes onboard” without a working Hermes sandbox.
[Impact / Notes]
-
Hermes onboarding cannot complete successfully on the affected environments, so the downstream test case (“connect to
hermes, run Hermes agent, verify answer with web search disabled”) cannot be executed. -
The logs show that image build and GPU checks pass; the failure is in the final sandbox creation/ready handshake between NemoClaw and the OpenShell gateway (create stream exits with code 1, sandbox never reports Ready).
-
Suggested fixes:
-
Investigate why the gateway’s “create stream” for the Hermes sandbox exits with code 1 after image upload (e.g., container start failure, health check command failure) and surface the underlying error in NemoHermes logs.
-
Avoid reporting prolonged “Still uploading image into OpenShell gateway...” after the image is already available, to distinguish upload problems from sandbox‑startup problems.
-
Ensure that, on failure, the user is given a more actionable message (e.g., pointer to specific Hermes startup logs inside the sandbox) rather than only “you can safely retry,” since repeated retries will likely hit the same failure.
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
NemoClaw, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Onboard, NemoClaw_Sandbox |
[NVB#6188498]
Description
[Description]
When running Hermes onboarding via
nemoclaw onboard --agent hermes(ornemohermes onboard), the sandbox image build and upload to the OpenShell gateway succeed, but the final sandbox create stream never completes cleanly and the Hermes sandbox never reaches Ready. Instead, the gateway reports “Create stream exited with code 1 after sandbox was created,” then waits for readiness, times out, and deletes the sandbox with the message “Sandbox 'hermes' was created but did not become ready within 180s. The orphaned sandbox has been removed — you can safely retry.” This prevents Hermes Agent from ever becoming usable and blocks the test case that expects Hermes onboarding to complete and the agent to respond to prompts.[Environment]
Device: Linux Ubuntu 24.04 and DGX Spark (both exhibit the failure)
NemoClaw: v0.0.37 (same CLI used for Hermes onboarding)
OpenShell CLI: 0.0.36 (per nemohermes preflight output)
OpenClaw: 2026.4.24 (implicit from environment; exact minor not central to failure)
Sandbox OS: Linux (Hermes base image
ghcr.io/nvidia/nemoclaw/hermes-sandbox-base:latest)Container runtime: Docker, GPU‑enabled (
--gpus allselected; GPU proofs all pass)Network: outbound HTTPS allowed; inference via NVIDIA Endpoints configured and healthy
[Steps to Reproduce]
Pre‑condition:
curl | bashor npm).Steps:
nemohermes onboard(or equivalently
nemoclaw onboard --agent hermeson this host).In the inference step, choose NVIDIA Endpoints (option 1), enter a valid NVIDIA API key, and select a cloud model (e.g.
nvidia/nemotron-3-super-120b-a12b).Accept defaults for web search and messaging, or configure Slack as in the transcript.
When prompted, keep the sandbox name as
hermes.Confirm the configuration (“Apply this configuration? [Y/n]: Y”).
Observe the log as NemoHermes builds the Hermes sandbox image from
hermes-sandbox-base, pushesopenshell/sandbox-from:into the gateway, and starts sandbox creation.Continue watching until the “Sandbox 'hermes' was created but did not become ready within 180s” message appears.
[Expected]
Push) the image into the OpenShell gateway.hermessandbox container and wait for it to reach Ready within the configured timeout.nemoclaw connectandhermescommands work.[Actual]
Built image openshell/sandbox-from:1778262149 Uploading image into OpenShell gateway... Pushing image openshell/sandbox-from:1778262149 into gateway "nemoclaw" [progress] Exported ... [progress] Uploaded to gateway Image openshell/sandbox-from:1778262149 is available in the gateway.Create stream exited with code 1 after sandbox was created. Checking whether the sandbox reaches Ready state... Waiting for sandbox to become ready...✓ Deleted sandbox hermes Sandbox 'hermes' was created but did not become ready within 180s. The orphaned sandbox has been removed — you can safely retry. Retry: nemohermes onboard[Impact / Notes]
hermes, run Hermes agent, verify answer with web search disabled”) cannot be executed.Bug Details
[NVB#6188498]