Skip to content

[Nemoclaw] [All Platforms] onboard with hermes agent fails #3359

@zNeill

Description

@zNeill

Description

Description
When user runs

nemoclaw onboard --agent hermes
onboard fails to complete, with following error
Sandbox 'hermes' was created but did not become ready within 180s.

Environment

Device:        Linux Ubuntu 24.04 or DGX Spark - both fail
NemoClaw:      v0.0.37

Steps to reproduce
nemoclaw onboard --agent hermes

name the sandbox hermes

Actual Result
local-lynnh@2u1g-x570-1795:~$ nemohermes onboard
  NemoHermes Onboarding
  ===================
  [1/8] Preflight checks
  ──────────────────────────────────────────────────
  ✓ Docker is running
  ✓ Container DNS resolution works
  ✓ Container runtime: docker
  ✓ Container runtime resources: 16 vCPU / 125.7 GiB
  ✓ openshell CLI: openshell 0.0.36
  ✓ Port 8080 already owned by healthy NemoHermes runtime (OpenShell gateway)
  ✓ NVIDIA GPU detected (NVIDIA RTX 6000 Ada Generation, 46068 MB)
  ✓ Memory OK: 128734 MB RAM + 0 MB swap
  NVIDIA GPU detected; enabling OpenShell GPU passthrough. Use --no-gpu to opt out.
  [2/8] Starting OpenShell gateway
  ──────────────────────────────────────────────────
  [reuse] Skipping gateway (running)
  Reusing healthy NemoClaw gateway.
  [3/8] Configuring inference (NIM)
  ──────────────────────────────────────────────────
  Detected local inference option: Ollama
  Inference options:
    1) NVIDIA Endpoints
    2) OpenAI
    3) Other OpenAI-compatible endpoint
    4) Anthropic
    5) Other Anthropic-compatible endpoint
    6) Google Gemini
    7) Local Ollama (localhost:11434) — running (suggested)
    8) Model Router (experimental)
  Choose [1]: 1
  ┌─────────────────────────────────────────────────────────────────┐
  │  NVIDIA API Key required                                        │
  │                                                                 │
  │  1. Go to https://build.nvidia.com/settings/api-keys            │
  │  2. Sign in with your NVIDIA account                            │
  │  3. Click 'Generate API Key' button                             │
  │  4. Paste the key below (starts with nvapi-)                    │
  └─────────────────────────────────────────────────────────────────┘
  NVIDIA API Key: **********************************************************************
  Key staged for the OpenShell gateway. It is held in process memory only;
  onboarding registers it with the gateway and nothing is written to disk.
  Cloud models:
    1) Nemotron 3 Super 120B (nvidia/nemotron-3-super-120b-a12b)
    2) Nemotron 3 Nano Omni 30B (nvidia/nemotron-3-nano-omni-30b-a3b-reasoning)
    3) GLM-5 (z-ai/glm-5.1)
    4) MiniMax M2.7 (minimaxai/minimax-m2.7)
    5) Kimi K2.6 (moonshotai/kimi-k2.6)
    6) GPT-OSS 120B (openai/gpt-oss-120b)
    7) DeepSeek V4 Pro (deepseek-ai/deepseek-v4-pro)
    8) Other...
  Choose model [1]:
  Chat Completions API available — Hermes will use openai-completions.
  Using NVIDIA Endpoints with model: nvidia/nemotron-3-super-120b-a12b
  Sandbox name (lowercase, starts with a letter, letters/numbers/internal hyphens only, ends with letter/number) [hermes]:
  ──────────────────────────────────────────────────
  Review configuration
  ──────────────────────────────────────────────────
  Provider:      nvidia-prod
  Model:         nvidia/nemotron-3-super-120b-a12b
  API key:       NVIDIA_API_KEY (staged for OpenShell gateway registration)
  Web search:    disabled
  Messaging:     none
  Sandbox name:  hermes
  Note:          Sandbox build typically takes 3–8 minutes on this host.
  ──────────────────────────────────────────────────
  Web search and messaging channels will be prompted next.
  Apply this configuration? [Y/n]:
  [4/8] Setting up inference provider
  ──────────────────────────────────────────────────
✓ Active gateway set to 'nemoclaw'
✓ Updated provider nvidia-prod
Gateway inference configured:
  Route: inference.local
  Provider: nvidia-prod
  Model: nvidia/nemotron-3-super-120b-a12b
  Version: 3
  Timeout: 60s (default)
  ✓ Inference route set: nvidia-prod / nvidia/nemotron-3-super-120b-a12b
  Web search is not yet supported by Hermes Agent. Skipping.
  [5/8] Messaging channels
  ──────────────────────────────────────────────────
  Available messaging channels:
    [1] ○ telegram — Telegram bot messaging
    [2] ○ discord — Discord bot messaging
    [3] ● slack — Slack bot messaging
  Press 1-3 to toggle, Enter when done:
  Slack API → Your Apps → OAuth & Permissions → Bot User OAuth Token (xoxb-...).
  Slack Bot Token: ***********************************************************
  ✓ slack token saved
  Slack API → Your Apps → Basic Information → App-Level Tokens (xapp-...).
  Slack App Token (Socket Mode): **************************************************************************************************
  ✓ slack app token saved
  [6/8] Creating sandbox
  ──────────────────────────────────────────────────
  Base image exists: ghcr.io/nvidia/nemoclaw/hermes-sandbox-base:latest
  Using Hermes Agent Dockerfile: /localhome/local-lynnh/.nemoclaw/source/agents/hermes/Dockerfile
  Including policy preset(s) at sandbox boot: slack
✓ Updated provider hermes-slack-bridge
✓ Updated provider hermes-slack-app
  Creating sandbox 'hermes' (this takes a few minutes on first run)...
  Pinning base image to sha256:0f12b1e980e2...
  Building sandbox image...
  Building image openshell/sandbox-from:1778262149 from /tmp/nemoclaw-build-7egAiX/Dockerfile
  Step 1/39 : ARG BASE_IMAGE=ghcr.io/nvidia/nemoclaw/hermes-sandbox-base:latest
  Step 2/39 : FROM ${BASE_IMAGE}
  Step 3/39 : RUN set -eu;     hermes_path="$(command -v hermes 2>/dev/null || true)";     if [ "$hermes_path" != "/usr/local/bin/hermes" ]; then         echo "ERROR: expected hermes at /usr/local/...
  Step 4/39 : RUN (apt-get remove --purge -y gcc gcc-12 g++ g++-12 cpp cpp-12 make         netcat-openbsd netcat-traditional ncat 2>/dev/null || true)     && apt-get autoremove --purge -y     && rm...
  Step 5/39 : ENV HERMES_TELEGRAM_DISABLE_FALLBACK_IPS=1
  Step 6/39 : COPY agents/hermes/plugin/ /opt/nemoclaw-hermes-plugin/
  Step 7/39 : RUN chmod -R a+rX /opt/nemoclaw-hermes-plugin/
  Step 8/39 : COPY agents/hermes/generate-config.ts /opt/nemoclaw-hermes-config/generate-config.ts
  Step 9/39 : COPY agents/hermes/config/ /opt/nemoclaw-hermes-config/config/
  Step 10/39 : RUN find /opt/nemoclaw-hermes-config -type d -exec chmod 755 {} +     && find /opt/nemoclaw-hermes-config -type f -exec chmod 444 {} +
  Step 11/39 : COPY agents/hermes/decode-proxy.py /usr/local/bin/nemoclaw-decode-proxy
  Step 12/39 : RUN chmod 755 /usr/local/bin/nemoclaw-decode-proxy
  Step 13/39 : COPY nemoclaw-blueprint/ /opt/nemoclaw-blueprint/
  Step 14/39 : RUN chmod -R a+rX /opt/nemoclaw-blueprint/
  Step 15/39 : COPY scripts/lib/sandbox-init.sh /usr/local/lib/nemoclaw/sandbox-init.sh
  Step 16/39 : COPY agents/hermes/start.sh /usr/local/bin/nemoclaw-start
  Step 17/39 : RUN chmod 755 /usr/local/bin/nemoclaw-start /usr/local/lib/nemoclaw/sandbox-init.sh
  Step 18/39 : ARG NEMOCLAW_MODEL=nvidia/nemotron-3-super-120b-a12b
  Step 19/39 : ARG NEMOCLAW_PROVIDER_KEY=inference
  Step 20/39 : ARG NEMOCLAW_INFERENCE_BASE_URL=https://inference.local/v1
  Step 21/39 : ARG CHAT_UI_URL=http://127.0.0.1:8642
  Step 22/39 : ARG NEMOCLAW_MESSAGING_CHANNELS_B64=WyJzbGFjayJd
  Step 23/39 : ARG NEMOCLAW_MESSAGING_ALLOWED_IDS_B64=e30=
  Step 24/39 : ARG NEMOCLAW_DISCORD_GUILDS_B64=e30=
  Step 25/39 : ARG NEMOCLAW_TELEGRAM_CONFIG_B64=e30=
  Step 26/39 : ARG NEMOCLAW_BUILD_ID=1778262149447
  Step 27/39 : ENV NEMOCLAW_MODEL=${NEMOCLAW_MODEL}     NEMOCLAW_PROVIDER_KEY=${NEMOCLAW_PROVIDER_KEY}     NEMOCLAW_INFERENCE_BASE_URL=${NEMOCLAW_INFERENCE_BASE_URL}     CHAT_UI_URL=${CHAT_UI_URL} ...
  Step 28/39 : WORKDIR /sandbox
  Step 29/39 : USER sandbox
  Step 30/39 : RUN mkdir -p /sandbox/.nemoclaw/blueprints/0.1.0     && cp -r /opt/nemoclaw-blueprint/* /sandbox/.nemoclaw/blueprints/0.1.0/
  Step 31/39 : RUN node --experimental-strip-types /opt/nemoclaw-hermes-config/generate-config.ts
  Step 32/39 : RUN mkdir -p /sandbox/.hermes/plugins/nemoclaw     && cp -r /opt/nemoclaw-hermes-plugin/* /sandbox/.hermes/plugins/nemoclaw/
  Step 33/39 : RUN mkdir -p /sandbox/.hermes/memories     && printf '%s\n'     'You are a helpful AI assistant running inside an NVIDIA OpenShell sandbox.'     'Your inference is routed through Nem...
  Step 34/39 : USER root
  Step 35/39 : RUN set -eu;     config_dir=/sandbox/.hermes;     data_dir=/sandbox/.hermes-data;     mkdir -p "$config_dir";     if [ -L "$data_dir" ]; then         echo "ERROR: refusing legacy lay...
  Step 36/39 : RUN mkdir -p /etc/nemoclaw     && sha256sum /sandbox/.hermes/config.yaml /sandbox/.hermes/.env     > /etc/nemoclaw/hermes.config-hash     && chown root:root /etc/nemoclaw/hermes.conf...
  Step 37/39 : RUN sha256sum /sandbox/.hermes/config.yaml /sandbox/.hermes/.env     > /sandbox/.hermes/.config-hash     && chmod 600 /sandbox/.hermes/.config-hash     && chown sandbox:sandbox /sand...
  Step 38/39 : ENTRYPOINT ["/usr/local/bin/nemoclaw-start"]
  Step 39/39 : CMD ["/bin/bash"]
  Built image openshell/sandbox-from:1778262149
  Uploading image into OpenShell gateway...
  Pushing image openshell/sandbox-from:1778262149 into gateway "nemoclaw"
  [progress] Exported 100 MiB
  [progress] Exported 200 MiB
  [progress] Exported 300 MiB
  [progress] Exported 315 MiB
  [progress] Uploaded to gateway
  Image openshell/sandbox-from:1778262149 is available in the gateway.
  Still uploading image into OpenShell gateway... (45s elapsed)
  Still uploading image into OpenShell gateway... (60s elapsed)
  Still uploading image into OpenShell gateway... (75s elapsed)
  Still uploading image into OpenShell gateway... (90s elapsed)
  Still uploading image into OpenShell gateway... (105s elapsed)
  Still uploading image into OpenShell gateway... (120s elapsed)
  Still uploading image into OpenShell gateway... (135s elapsed)
  Still uploading image into OpenShell gateway... (150s elapsed)
  Still uploading image into OpenShell gateway... (165s elapsed)
  Still uploading image into OpenShell gateway... (180s elapsed)
  Still uploading image into OpenShell gateway... (195s elapsed)
  Still uploading image into OpenShell gateway... (210s elapsed)
  Still uploading image into OpenShell gateway... (225s elapsed)
  Still uploading image into OpenShell gateway... (240s elapsed)
  Still uploading image into OpenShell gateway... (255s elapsed)
  Still uploading image into OpenShell gateway... (270s elapsed)
  Still uploading image into OpenShell gateway... (285s elapsed)
  Still uploading image into OpenShell gateway... (300s elapsed)
  Still uploading image into OpenShell gateway... (315s elapsed)
  Create stream exited with code 1 after sandbox was created.
  Checking whether the sandbox reaches Ready state...
  Waiting for sandbox to become ready...
✓ Deleted sandbox hermes
  Sandbox 'hermes' was created but did not become ready within 180s.
  The orphaned sandbox has been removed — you can safely retry.
  Retry: nemohermes onboard
Expected Result onboard successful and sandbox is created

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Onboard, NemoClaw_Sandbox

[NVB#6159267]

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA Teamarea: sandboxOpenShell sandbox lifecycle, runtime, config, or recovery

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions