Description
Description
NemoClaw v0.0.43 express onboard on DGX Station prints scary "Deployment
verification found issues" warnings at the end claiming gateway and dashboard
are broken, BUT immediately checking `nemoclaw status` shows
everything is healthy. This is a timing race in the wizard's post-onboard
verification — it polls before the gateway / dashboard port forward are ready.
The warning message tells users to "Check /tmp/gateway.log inside the sandbox",
but that file does not exist (we verified via docker exec). So users following
the suggested diagnostic find no log and may think their install is broken
when it actually works.
Environment
Device: DGX Station GB300 (galaxy-sku2-018, host 10.176.192.158)
OS: Ubuntu 24.04.4 LTS
Architecture: aarch64
Kernel: 6.17.0-1014-nvidia-64k
GPU: NVIDIA GB300 (256703 MB) + NVIDIA RTX PRO 6000 Blackwell Max-Q (97887 MB)
NVIDIA driver: 610.39
Node.js: v22.22.3 (auto-installed by curl|bash)
npm: 10.9.8
Docker: 29.5.0 (build 98f1464)
nvidia-ctk: 1.19.0
OpenShell CLI: 0.0.39
NemoClaw: v0.0.43
OpenClaw: 2026.4.24 (cbcfdf6)
Steps to Reproduce
1. Start from a clean Station (no prior nemoclaw / openshell / ~/.nemoclaw)
2. export HF_TOKEN=
3. Run: curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
4. Type 'yes' to license, 'y' to express → express path
5. Wait for express onboard to complete (about 10 min on warm cache)
6. Observe the very last screen of the wizard
Expected Result
Wizard reports onboarding completed cleanly:
- "Installation complete" banner
- sandbox 'my-assistant' is ready
- Gateway HTTP 200, dashboard port 18789 reachable
- No false ✗ markers
Actual Result
Wizard prints:
⚠ Deployment verification found issues:
✗ gateway: HTTP 0 (gateway not responding)
The gateway process may have crashed during startup. Check
/tmp/gateway.log inside the sandbox.
✗ dashboard: port forward not working (connection refused)
Port forward on 18789 is not working. Run:
openshell forward start 18789 my-assistant
The sandbox was created successfully but may not be fully functional.
Run: nemoclaw status — to re-check after a few seconds.
But immediately after on the host:
$ nemoclaw my-assistant status
Sandbox: my-assistant
Model: Qwen/Qwen3.6-27B-FP8
Provider: vllm-local
Inference (vllm backend): healthy (http://127.0.0.1:8000/v1/models)
Host GPU: yes
Sandbox GPU: enabled (auto)
Phase: Ready
$ openshell sandbox list
NAME PHASE
my-assistant Ready
$ docker ps # both containers up
openshell-my-assistant-... Up
nemoclaw-vllm Up
And the diagnostic file mentioned does not exist:
docker exec ls /tmp/*.log → no such file
So the verification ran before the gateway and dashboard port forward had
finished coming up, raising false alarms. The follow-up hint ("Check
/tmp/gateway.log") points at a file that doesn't get written at all.
Logs
Reproduced 2026-05-15 ~05:54 UTC during T6015057 manual test.
Diagnostic trace at /home/lab/day0-automation/20260511/_t6015057_console.log
on local-lab.
Impact
- Users see ✗ markers on their fresh install and think something is broken.
- The recommended diagnostic command (cat /tmp/gateway.log) finds nothing,
wasting troubleshooting time.
- QA cannot mark T6015057 "Pass" without manually checking after the fact
that the install actually works — wizard output suggests Fail.
- Wizard should either (a) wait long enough for gateway/dashboard to come up
before verifying, (b) retry verification with backoff, or (c) remove the
false-alarm warning until verification is reliable.
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
NemoClaw, NemoClaw_CLI&UX, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Onboard |
[NVB#6179568]
Description
Description
Environment Steps to Reproduce Expected Result Actual ResultWizard prints: ⚠ Deployment verification found issues: ✗ gateway: HTTP 0 (gateway not responding) The gateway process may have crashed during startup. Check /tmp/gateway.log inside the sandbox. ✗ dashboard: port forward not working (connection refused) Port forward on 18789 is not working. Run: openshell forward start 18789 my-assistant The sandbox was created successfully but may not be fully functional. Run: nemoclaw status — to re-check after a few seconds. But immediately after on the host: $ nemoclaw my-assistant status Sandbox: my-assistant Model: Qwen/Qwen3.6-27B-FP8 Provider: vllm-local Inference (vllm backend): healthy (http://127.0.0.1:8000/v1/models) Host GPU: yes Sandbox GPU: enabled (auto) Phase: Ready $ openshell sandbox list NAME PHASE my-assistant Ready $ docker ps # both containers up openshell-my-assistant-... Up nemoclaw-vllm Up And the diagnostic file mentioned does not exist: docker exec ls /tmp/*.log → no such file So the verification ran before the gateway and dashboard port forward had finished coming up, raising false alarms. The follow-up hint ("Check /tmp/gateway.log") points at a file that doesn't get written at all.Logs ImpactBug Details
[NVB#6179568]