Skip to content

nemoclaw onboard` is not resumable — partial failures require full cleanup and restart #446

@pixan-ai

Description

@pixan-ai

nemoclaw onboard is not resumable — partial failures require full cleanup and restart

Summary

The nemoclaw onboard wizard runs 7 sequential steps (API key → preflight → gateway → sandbox → inference → OpenClaw → policies). If any step fails mid-way (due to OOM, network timeout, terminal disconnect, etc.), the wizard cannot be resumed from the point of failure. Worse, leftover state from the partial run can block subsequent attempts.

Environment

Component Detail
Platform NVIDIA Brev (GCP, Ubuntu 24.04)
RAM Tested on 8 GiB and 16 GiB instances
NemoClaw Installed via official installer (March 18–19, 2026)

Observed Behavior

Problem 1: No checkpoint/resume

When the onboard wizard fails at step 4 (sandbox image push — common on low-memory instances), rerunning nemoclaw onboard starts from step 1 again. There is no --resume or --from-step flag. On low-bandwidth or resource-constrained environments, this means repeating 5–10 minutes of work that already completed successfully.

Problem 2: Stale gateway blocks re-onboarding

If the wizard fails after the gateway has been started (step 3), the OpenShell gateway remains running on port 8080. On the next attempt:

Error: Gateway already running on port 8080

The user must manually clean up before retrying:

openshell gateway destroy -g nemoclaw

This is not documented in the quickstart or troubleshooting guides.

Problem 3: nemoclaw not in PATH after install

After a fresh install via the official script, nemoclaw is not available in the current shell session:

$ nemoclaw my-assistant connect
nemoclaw: command not found

Requires source ~/.bashrc to pick up the updated PATH. This is especially confusing when the install script's final output says to run nemoclaw my-assistant connect — the suggested command immediately fails.

Expected Behavior

  1. Resumable onboard: The wizard should detect which steps have already completed successfully and skip them. At minimum, provide a --resume flag.
  2. Clean error on stale gateway: If a gateway is already running, the wizard should offer to restart it instead of failing with an opaque error.
  3. PATH updated in current session: The install script should export PATH or instruct the user to source ~/.bashrc in its final output.

Suggested Improvements

# Ideal experience after a failure:
$ nemoclaw onboard --resume
[1/7] API Key ............ ✓ (cached)
[2/7] Preflight .......... ✓ (cached)
[3/7] Gateway ............ ✓ (running)
[4/7] Sandbox ............ resuming image push...

For the PATH issue, the install script could append:

echo ""
echo "[INFO] Run 'source ~/.bashrc' or open a new terminal to use 'nemoclaw'"

Workaround

Full cleanup before retrying:

openshell gateway destroy -g nemoclaw 2>/dev/null
# If sandbox was partially created:
openshell sandbox delete <name> 2>/dev/null
# Then re-run:
curl -fsSL https://nvidia.com/nemoclaw.sh | bash

Metadata

Metadata

Assignees

Labels

area: cliCommand line interface, flags, terminal UX, or outputarea: installInstall, setup, prerequisites, or uninstall flowarea: onboardingOnboarding FSM, provider setup, sandbox launch, or first-run flowplatform: brevAffects Brev hosted development environments

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions