Skip to content

[Station][Brev][Onboard] misleading post-install gateway/token errors — auto-recovers on connect #3573

@hulynn

Description

@hulynn

Description

Description

At the end of `nemoclaw onboard`, step [8/8] applies policy presets, which
restarts the sandbox container. The post-install deployment verification then
runs IMMEDIATELY and finds:

  1. The OpenClaw gateway is no longer responding (because it died with the
     pre-restart container).
  2. The gateway auth token file cannot be read (same reason — the gateway
     has not regenerated it yet on the new container).

Onboard reports these as fatal-looking errors:

  ✗ gateway: HTTP 0 (gateway not responding)
    The gateway process may have crashed during startup. Check /tmp/gateway.log
  Could not read gateway token from the sandbox (download failed).
  ✗ dashboard: port forward not working (connection refused)

All three messages are misleading. The gateway did NOT crash — it was killed
cleanly by the policy-apply restart, and `nemoclaw  connect` correctly
auto-recovers it (re-launches gateway, regenerates token, re-establishes the
18789 forward). After `connect`, `nemoclaw  status` shows everything
healthy.

Users see "gateway crashed during startup" + "token download failed" at the
end of a 12-minute install and reasonably conclude the install is broken,
then waste time chasing a non-issue.
Environment
Reproduced on three independent platforms (2026-05-15):

  1. macOS 26.1 (Darwin 25.1.0, arm64) + Colima
  2. Brev / Shadeform Ubuntu 22.04, NVIDIA H100 PCIe
  3. DGX Station, Ubuntu (galaxy-sku2-018), RTX PRO 6000 Blackwell + GB300

Versions:
  NemoClaw:      v0.0.41 / latest from curl https://www.nvidia.com/nemoclaw.sh
  OpenShell CLI: 0.0.39
  OpenClaw:      v2026.4.24
  Node.js:       v22.22.3
  Docker:        27.4.0 / 29.4.3
Steps to Reproduce
1. Clean host (no prior NemoClaw state).
2. curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
   (accept TPS prompt; let express install run to completion)
3. Watch the tail of the install log — step [8/8] applies policy presets,
   then verification fires and reports the three errors above.
4. nemoclaw  status        # everything is healthy
5. nemoclaw  connect        # auto-recovers (gateway restarted,
                                     #                token regenerated,
                                     #                18789 forward up)
Expected Result
Post-install verification waits for the post-policy-apply sandbox to stabilize,
then either reports a clean ✓ or omits the gateway/token/dashboard checks if
they will be created lazily by the first `connect`. No false "gateway crashed
during startup" / "token download failed" warnings.
Actual Result
⚠ Deployment verification found issues:

  ✗ gateway: HTTP 0 (gateway not responding)
    The gateway process may have crashed during startup. Check /tmp/gateway.log inside the sandbox.
  ✗ dashboard: port forward not working (connection refused)
    Port forward on 18789 is not working. Run: openshell forward start 18789

Could not read gateway token from the sandbox (download failed).

  The sandbox was created successfully but may not be fully functional.

Then on first `nemoclaw  connect`:

  OpenClaw gateway is not running inside the sandbox (sandbox likely restarted).
  Recovering...
  ✓ OpenClaw gateway restarted inside sandbox.
  ✓ Dashboard port forward re-established.
Gateway log evidence (sandbox restart is graceful, not a crash)
$ cat /tmp/gateway.log | tail
2026-05-15T08:23:20.577+00:00 [gateway] loading configuration…
2026-05-15T08:23:20.597+00:00 [gateway] resolving authentication…
2026-05-15T08:23:22.324+00:00 [gateway] auth token was missing. Generated a new token and saved it to config (gateway.auth.token).
2026-05-15T08:23:23.307+00:00 [gateway] starting HTTP server...
2026-05-15T08:23:23.599+00:00 [gateway] ready (4 plugins: browser, device-pair, phone-control, talk-voice; 3.0s)

(Timestamps are AFTER `nemoclaw connect` triggered recovery, confirming there
was no prior crash — the gateway simply had not been started on the post-policy
sandbox.)

Bug Details

Field Value
Priority Unprioritized
Action Dev - Open - To fix
Disposition Open issue
Module Machine Learning - NemoClaw
Keyword NemoClaw, NemoClaw_CLI&UX, NEMOCLAW_GH_SYNC_APPROVAL, NemoClaw_Onboard

[NVB#6179954]

Metadata

Metadata

Assignees

Labels

NV QABugs found by the NVIDIA QA Teamplatform: brevAffects Brev hosted development environments

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions