Description
Description
docs/reference/troubleshooting.md (rendered at
https://docs.nvidia.com/nemoclaw/latest/reference/troubleshooting.html)
recommends three `openshell gateway` subcommands in its recovery flows that
do not exist on the shipped OpenShell 0.0.39 / NemoClaw v0.0.44 toolchain:
1. `openshell gateway start --name nemoclaw`
— in "Reconnect after a host reboot" (commands.md:518, troubleshooting
page section "Runtime" → "Reconnect after a host reboot")
2. `openshell gateway trust -g nemoclaw`
— in "Sandbox creation reports a TLS certificate mismatch"
(troubleshooting.md:606)
3. `openshell gateway destroy` + `openshell gateway start`
— in "k3s cannot find a freshly built image" under the DGX Spark
section (troubleshooting.md:1042-1043)
The valid `openshell gateway` subcommands on v0.0.39 are:
add, remove, login, logout, select, info, list
Each of `start`, `trust`, and `destroy` produces:
error: unrecognized subcommand ''
Impact is high because all three references live inside symptom-fix flows
that a real user would run during a real failure (host reboot, TLS reset,
k3s image cache issue on DGX Spark) — they will hit the "unrecognized
subcommand" error immediately and be stuck.
The rest of the page checks out: tested 75 H2/H3 sections, 81 code blocks,
77 same-page anchors (all resolve), 48 internal links (all 200), 16 external
links (all 200), and verified the prose command references for nemoclaw
onboard / rebuild / list / status / policy-add / channels / inference /
debug / gc / uninstall / tunnel / openshell sandbox list/delete / openshell
forward start/list / openshell term against the live CLIs — those are
correct. Drift is isolated to the three `openshell gateway` recovery
commands.
Environment
Device: ipp2-1558 (10.176.178.100), x86_64 server, 32 vCPU / 125 GB RAM, NVIDIA A100 80GB PCIe
OS: Ubuntu 24.04.4 LTS (Linux 6.17.0-23-generic)
Architecture: x86_64
Node.js: v22.x (installed via nvm by NemoClaw installer)
npm: bundled
Docker: 29.5.0
OpenShell CLI: 0.0.39
NemoClaw: v0.0.44
OpenClaw: N/A (docs-only bug)
Steps to Reproduce
1. Open https://docs.nvidia.com/nemoclaw/latest/reference/troubleshooting.html
2. In "Runtime" → "Reconnect after a host reboot", read step 3:
$ openshell gateway start --name nemoclaw
3. In "Runtime" → "Sandbox creation reports a TLS certificate mismatch",
read the recovery snippet:
$ openshell gateway trust -g nemoclaw
$ nemoclaw onboard --resume
4. In "DGX Spark" → "k3s cannot find a freshly built image", read the
recovery snippet:
$ openshell gateway destroy
$ openshell gateway start
5. Run each of the three subcommands against OpenShell 0.0.39:
$ openshell gateway start --name nemoclaw
$ openshell gateway trust -g nemoclaw
$ openshell gateway destroy
6. List the real subcommands:
$ openshell gateway --help
Expected Result
Every command typed in a troubleshooting recovery flow resolves to a real
`openshell` subcommand. The recovery flows produce a working result on
v0.0.44.
Actual Result
Step 5 output:
$ openshell gateway start --name nemoclaw
error: unrecognized subcommand 'start'
Usage: openshell gateway [OPTIONS] [COMMAND]
$ openshell gateway trust -g nemoclaw
error: unrecognized subcommand 'trust'
Usage: openshell gateway [OPTIONS] [COMMAND]
$ openshell gateway destroy
error: unrecognized subcommand 'destroy'
Usage: openshell gateway [OPTIONS] [COMMAND]
Step 6 (real `openshell gateway --help` on 0.0.39):
COMMANDS
add Add an existing gateway
remove Remove a local gateway registration
login Authenticate with an edge-authenticated or OIDC gateway
logout Clear stored authentication credentials for a gateway
select Select the active gateway
info Show gateway registration details
list List registered gateways
→ start / trust / destroy are all absent.
Net effect: a user following the troubleshooting docs hits "unrecognized
subcommand" on the very first command of three different recovery flows.
Logs
$ openshell --version
openshell 0.0.39
$ openshell gateway list
NAME ENDPOINT TYPE AUTH
* nemoclaw http://127.0.0.1:8080 local plaintext
$ openshell gateway start --name nemoclaw
error: unrecognized subcommand 'start'
tip: a similar subcommand exists: 'select'
Usage: openshell gateway [OPTIONS] [COMMAND]
For more information, try '--help'.
Suggested Fix
For each broken reference, pick the actual maintained recovery flow and
update troubleshooting.md to match. Likely correct replacements (subject to
confirmation from the OpenShell team):
(1) "Reconnect after a host reboot" step 3:
Replace
$ openshell gateway start --name nemoclaw
with whatever the supported "bring the local gateway container back up"
flow is. Candidates:
• `docker start ` if the gateway runs as a
long-lived docker container restored from disk state, OR
• re-run `nemoclaw onboard --resume` to walk through gateway
bring-up, OR
• a sequence using `openshell gateway remove nemoclaw` followed by
`openshell gateway add http://127.0.0.1:8080 --local --name nemoclaw`
to re-register a still-running container.
(2) "Sandbox creation reports a TLS certificate mismatch":
Replace
$ openshell gateway trust -g nemoclaw
with the actual cert-refresh path on 0.0.39. Either
• `openshell gateway login nemoclaw` for edge-authenticated gateways
(re-runs the login flow and re-establishes trust), OR
• `openshell gateway remove nemoclaw` then `openshell gateway add ...`
to re-derive and re-store the gateway's TLS material.
(3) "k3s cannot find a freshly built image" (DGX Spark):
Replace
$ openshell gateway destroy
$ openshell gateway start
with the supported "tear down and re-create the local gateway"
sequence. Likely:
• `openshell gateway remove nemoclaw`, then re-run `nemoclaw onboard`
(or a docker-level restart of the gateway container).
In all three cases, please also add a short sentence saying which OpenShell
CLI version is required, so a reader on a newer OpenShell can recognize
that they may have a different command set.
Bug Details
| Field |
Value |
| Priority |
Unprioritized |
| Action |
Dev - Open - To fix |
| Disposition |
Open issue |
| Module |
Machine Learning - NemoClaw |
| Keyword |
NemoClaw, NemoClaw_Docs, NEMOCLAW_GH_SYNC_APPROVAL |
[NVB#6186667]
Description
Description
docs/reference/troubleshooting.md (rendered at https://docs.nvidia.com/nemoclaw/latest/reference/troubleshooting.html) recommends three `openshell gateway` subcommands in its recovery flows that do not exist on the shipped OpenShell 0.0.39 / NemoClaw v0.0.44 toolchain: 1. `openshell gateway start --name nemoclaw` — in "Reconnect after a host reboot" (commands.md:518, troubleshooting page section "Runtime" → "Reconnect after a host reboot") 2. `openshell gateway trust -g nemoclaw` — in "Sandbox creation reports a TLS certificate mismatch" (troubleshooting.md:606) 3. `openshell gateway destroy` + `openshell gateway start` — in "k3s cannot find a freshly built image" under the DGX Spark section (troubleshooting.md:1042-1043) The valid `openshell gateway` subcommands on v0.0.39 are: add, remove, login, logout, select, info, list Each of `start`, `trust`, and `destroy` produces: error: unrecognized subcommand '' Impact is high because all three references live inside symptom-fix flows that a real user would run during a real failure (host reboot, TLS reset, k3s image cache issue on DGX Spark) — they will hit the "unrecognized subcommand" error immediately and be stuck. The rest of the page checks out: tested 75 H2/H3 sections, 81 code blocks, 77 same-page anchors (all resolve), 48 internal links (all 200), 16 external links (all 200), and verified the prose command references for nemoclaw onboard / rebuild / list / status / policy-add / channels / inference / debug / gc / uninstall / tunnel / openshell sandbox list/delete / openshell forward start/list / openshell term against the live CLIs — those are correct. Drift is isolated to the three `openshell gateway` recovery commands.Environment Steps to Reproduce1. Open https://docs.nvidia.com/nemoclaw/latest/reference/troubleshooting.html 2. In "Runtime" → "Reconnect after a host reboot", read step 3: $ openshell gateway start --name nemoclaw 3. In "Runtime" → "Sandbox creation reports a TLS certificate mismatch", read the recovery snippet: $ openshell gateway trust -g nemoclaw $ nemoclaw onboard --resume 4. In "DGX Spark" → "k3s cannot find a freshly built image", read the recovery snippet: $ openshell gateway destroy $ openshell gateway start 5. Run each of the three subcommands against OpenShell 0.0.39: $ openshell gateway start --name nemoclaw $ openshell gateway trust -g nemoclaw $ openshell gateway destroy 6. List the real subcommands: $ openshell gateway --helpExpected Result Actual ResultStep 5 output: $ openshell gateway start --name nemoclaw error: unrecognized subcommand 'start' Usage: openshell gateway [OPTIONS] [COMMAND] $ openshell gateway trust -g nemoclaw error: unrecognized subcommand 'trust' Usage: openshell gateway [OPTIONS] [COMMAND] $ openshell gateway destroy error: unrecognized subcommand 'destroy' Usage: openshell gateway [OPTIONS] [COMMAND] Step 6 (real `openshell gateway --help` on 0.0.39): COMMANDS add Add an existing gateway remove Remove a local gateway registration login Authenticate with an edge-authenticated or OIDC gateway logout Clear stored authentication credentials for a gateway select Select the active gateway info Show gateway registration details list List registered gateways → start / trust / destroy are all absent. Net effect: a user following the troubleshooting docs hits "unrecognized subcommand" on the very first command of three different recovery flows.Logs Suggested FixFor each broken reference, pick the actual maintained recovery flow and update troubleshooting.md to match. Likely correct replacements (subject to confirmation from the OpenShell team): (1) "Reconnect after a host reboot" step 3: Replace $ openshell gateway start --name nemoclaw with whatever the supported "bring the local gateway container back up" flow is. Candidates: • `docker start ` if the gateway runs as a long-lived docker container restored from disk state, OR • re-run `nemoclaw onboard --resume` to walk through gateway bring-up, OR • a sequence using `openshell gateway remove nemoclaw` followed by `openshell gateway add http://127.0.0.1:8080 --local --name nemoclaw` to re-register a still-running container. (2) "Sandbox creation reports a TLS certificate mismatch": Replace $ openshell gateway trust -g nemoclaw with the actual cert-refresh path on 0.0.39. Either • `openshell gateway login nemoclaw` for edge-authenticated gateways (re-runs the login flow and re-establishes trust), OR • `openshell gateway remove nemoclaw` then `openshell gateway add ...` to re-derive and re-store the gateway's TLS material. (3) "k3s cannot find a freshly built image" (DGX Spark): Replace $ openshell gateway destroy $ openshell gateway start with the supported "tear down and re-create the local gateway" sequence. Likely: • `openshell gateway remove nemoclaw`, then re-run `nemoclaw onboard` (or a docker-level restart of the gateway container). In all three cases, please also add a short sentence saying which OpenShell CLI version is required, so a reader on a newer OpenShell can recognize that they may have a different command set.Bug Details
[NVB#6186667]