Skip to content

[Ubuntu 24.04][Onboard] Brave-preset sandbox onboard times out: "Sandbox … did not become ready within 180s" after 500+s gateway-image upload #3344

@wangericnv

Description

@wangericnv

Description

Description

On NemoClaw v0.0.38 (Ubuntu 24.04, RTX 6000 Ada), running `nemoclaw onboard --fresh --name brave-test` with Ollama (qwen2.5:7b) provider and `BRAVE_API_KEY` set in the shell so the Brave web-search preset is enabled, the wizard exports the sandbox image successfully and uploads it to the OpenShell gateway, but then the in-gateway sandbox-create step takes much longer than the wizard's 180s readiness check — the wizard concludes "Sandbox 'brave-test' was created but did not become ready within 180s. The orphaned sandbox has been removed — you can safely retry." and the resulting registry has no sandbox.

The image-upload phase alone logged "Still uploading image into OpenShell gateway... (525s elapsed)" before "Image openshell/sandbox-from:1778494022 is available in the gateway." — so the cumulative onboard time on this host is roughly 9 min image-upload + 3 min sandbox-ready wait. The 180s timeout is hit even though the sandbox was just created seconds earlier.

Net effect: onboard with the Brave preset enabled fails on Ubuntu 24.04 + RTX 6000 Ada with the latest installer despite a healthy gateway, Ollama running, and a valid Brave API key — making the Web-Search day0 path (T560826) un-passable without a manual retry / longer timeout.
Environment
Device:        Ubuntu workstation 2u1g-x570-1795 (10.63.136.90)
OS:            Ubuntu 24.04.4 LTS (kernel 6.17.0-19-generic)
Architecture:  x86_64
GPU:           NVIDIA RTX 6000 Ada Generation, 46068 MiB
Node.js:       v22.22.2
npm:           10.9.7
Docker:        29.4.3 (build 055a478)
OpenShell CLI: openshell 0.0.36
NemoClaw:      v0.0.38
OpenClaw:      2026.4.24 (cbcfdf6)
Provider:      Local Ollama, model qwen2.5:7b (4.7 GB)
Preset:        brave + npm/pypi/huggingface/brew/local-inference auto-applied
BRAVE_API_KEY: 31-char BSA-prefixed key (provided in shell)
Steps to Reproduce
1. Fresh-ish host with NemoClaw v0.0.38 installed; gateway 'nemoclaw'
   already running healthy; Ollama installed and running with qwen2.5:7b.
2. export BRAVE_API_KEY=BSA...
3. Run:  nemoclaw onboard --fresh --name brave-test
4. Wizard prompts (this run):
   [3/8] inference: Choose [1]: 7        (Local Ollama)
         Choose model [1]:                (default qwen2.5:7b)
   "Apply this configuration? [Y/n]:"     → Enter
   "Enable Brave Web Search? [y/N]:"      → y
   (Brave key prompt did NOT appear in this run — BRAVE_API_KEY env
   was already set, so the wizard accepted it silently)
   [5/8] Messaging channels                → Enter (skip)
   ... policies / preset stage continues ...
   [8/8] Building sandbox image            (image export, upload)
5. Observe: "Still uploading image into OpenShell gateway... (525s elapsed)"
6. After ~9 min, "Image openshell/sandbox-from: is available in the gateway"
7. Wizard then: "Starting NIM container..."  (well, OpenClaw sandbox container)
   then: "Waiting for sandbox to become ready..."
8. After 180s, wizard prints failure and removes the orphan.
Expected Result
After a successful sandbox image upload to the gateway, the readiness
wait must give the sandbox enough time to finish initialization
(Ollama proxy bring-up + OpenClaw agent boot + brave preset egress
rules + policy version load) on a normal workstation. Either:
  (a) extend the per-readiness-poll deadline (180s → e.g. 600s) so
      large preset stacks have time to come up, OR
  (b) actively surface what subsystem is still not-yet-ready so the
      user knows whether to wait or abort, OR
  (c) decouple readiness from a fixed deadline and let the user
      `nemoclaw  status` afterwards to inspect.

Either way, on a non-pathological host (40+ GB free VRAM, fast disk,
healthy gateway, valid API key), `nemoclaw onboard --fresh` with
the brave preset enabled should produce a Ready sandbox in
`nemoclaw list` on the first run.
Actual Result
Wizard tail:
  [progress] Exported 609 MiB
  [progress] Uploaded to gateway
  Image openshell/sandbox-from:1778494022 is available in the gateway.
  Still uploading image into OpenShell gateway... (525s elapsed)
  Create stream exited with code 1 after sandbox was created.
  Checking whether the sandbox reaches Ready state...
  Waiting for sandbox to become ready...
  ✓ Deleted sandbox brave-test
  Sandbox 'brave-test' was created but did not become ready within 180s.
  The orphaned sandbox has been removed — you can safely retry.
  Retry: nemoclaw onboard

$ nemoclaw list
  Sandboxes:
    my-assistant *           (from a separate earlier silent-install run)
$ # no 'brave-test' present
Logs
Full per-step capture: /home/lab/day0-automation/20260511/report-T560826.txt
Driver: /home/lab/day0-automation/20260511/drive_T560826.py
Onboard state-machine transitions:
  [10:06:33] inference (selected Ollama at item 7)
  [10:06:40] model menu (Enter, default qwen2.5:7b)
  [10:06:47] yn (Apply config → Enter)
  [10:06:53] brave_yn (Enable Brave Web Search → y)
  [10:07:00] messaging (Enter, skip)
  [10:07:06] wait (image build + upload + sandbox-create)
  [10:18:52] bash (wizard ended; "did not become ready within 180s")

Note: another sandbox 'my-assistant' DOES carry the brave preset
(applied by an earlier silent-install run T560899), so the brave
preset itself can be applied successfully. The defect is specifically
the 180s readiness wait timing out in the fresh-onboard path.

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Onboard, NemoClaw_Sandbox

[NVB#6164319]

Metadata

Metadata

Assignees

No one assigned

    Labels

    NV QABugs found by the NVIDIA QA Teamintegration: braveBrave integration behaviorplatform: ubuntuAffects Ubuntu Linux environments

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions