refactor(cli): typed command registry to fix docs-drift and re-enable cloud-experimental-e2e

## Description

### Problem Statement

The `cloud-experimental-e2e` nightly job has been disabled since PR #2367 (via `vars.CLOUD_EXPERIMENTAL_E2E_ENABLED`) and has **never passed** in the nightly pipeline. It was previously blocked by two issues:

1. **Landlock `/sandbox` writability regression** — **now resolved.** OpenShell #810 (two-phase Landlock fix) merged April 13, and NemoClaw issue #1739 was closed April 21. The blueprint already pins `min_openshell_version: "0.0.32"` and `install-openshell.sh` pins `MIN_VERSION="0.0.32"`, both above the fix.

2. **CLI/docs command reference drift (Phase 5f)** — **still broken.** `check-docs.sh` compares `nemoclaw --help` output against `### \`nemoclaw …\`` headings in `docs/reference/commands.md`. The check finds 36 commands in `--help` but only 31 headings in the docs. The mismatch comes from two sources:
   - **Parser brittleness:** the perl regex that normalizes `--help` lines fails to strip inline description text when commands use single-space separators (e.g., `nemoclaw <name> channels remove <channel> Clear credentials and rebuild` leaks the description)
   - **Real drift:** commands like `nemoclaw onboard --from`, `nemoclaw setup`, `nemoclaw start`, `nemoclaw stop` were added to `help()` without corresponding `###` headings in `commands.md`

There is an existing issue (#2333) and open PR (#2332) that patches both the parser and the docs. But patching is not the right long-term fix — the same problem will recur every time someone adds or modifies a CLI command.

### Root Cause

Commands are defined in **three independent, untyped places** with no shared source of truth:

1. **`help()` function** in `src/nemoclaw.ts` (~line 2929) — hand-written `console.log()` strings with ANSI formatting
2. **Dispatch switch** in `src/nemoclaw.ts` (~line 3026) — the actual command routing logic
3. **`docs/reference/commands.md`** — hand-written Markdown headings

Every time someone adds a command, they must update all three by hand. `check-docs.sh` tries to catch drift after the fact by parsing the `--help` text output, but the parser itself is fragile and contributes false mismatches.

### Proposed Design

Create a **typed command registry** as the single source of truth, then derive all three outputs from it.

#### Phase 1: Typed command registry

Create `src/lib/command-registry.ts`:

```typescript
interface CommandDef {
  usage: string;           // "nemoclaw <name> snapshot create"
  description: string;     // "Create a snapshot of sandbox state"
  flags?: string;          // "[--name <label>]"
  group: CommandGroup;     // "Sandbox Management"
  deprecated?: boolean;
}

type CommandGroup =
  | "Getting Started"
  | "Sandbox Management"
  | "Skills"
  | "Policy Presets"
  | "Messaging Channels"
  | "Compatibility Commands"
  | "Services"
  | "Troubleshooting"
  | "Credentials"
  | "Backup"
  | "Upgrade"
  | "Cleanup";

export const COMMANDS: readonly CommandDef[] = [
  { usage: "nemoclaw onboard", description: "Configure inference endpoint and credentials", group: "Getting Started" },
  // ...all commands
] as const;
```

TypeScript enforces that every entry has the required fields and valid group. Adding a command means adding one entry to this array.

#### Phase 2: Generate `help()` from registry

Rewrite the `help()` function to iterate `COMMANDS` grouped by `group`, applying ANSI formatting. No more hand-written command lines in the help output.

#### Phase 3: Validate docs from registry

Either:
- **Option A:** Generate `docs/reference/commands.md` headings from the registry (like the existing `docs-to-skills.py` pattern), or
- **Option B:** Simplify `check-docs.sh` to import the registry directly and compare against Markdown headings — no perl parsing of `--help` output

Option A is stronger (eliminates drift entirely), but Option B is simpler and may be more practical given the Sphinx/MyST doc pipeline.

#### Phase 4: Re-enable cloud-experimental-e2e

Once Phase 5f passes reliably, flip `vars.CLOUD_EXPERIMENTAL_E2E_ENABLED` to `true`. The Landlock blocker is already resolved.

### Alternatives Considered

- **Patch check-docs.sh and commands.md (PR #2332):** Fixes the immediate mismatch but doesn't prevent recurrence. Every future command addition risks the same drift.
- **Remove cloud-experimental-e2e entirely:** Rejected — it's the only test that exercises the public installer flow, interactive onboard, skill injection + agent verification, OpenClaw TUI, Landlock enforcement, and doc validation. No other E2E test covers these.

### What cloud-experimental-e2e uniquely covers

This is the most comprehensive single E2E test in the suite (2,450 lines, 12 scripts). Capabilities not tested elsewhere:

| Phase | Unique Coverage |
|-------|----------------|
| Phase 3 | Full public installer (`curl nvidia.com/nemoclaw.sh \| bash`) with interactive expect-driven onboard |
| Phase 5 checks | Landlock read-only enforcement (04-landlock-readonly.sh) |
| Phase 5b | Live cloud chat via inference.local → gateway → NVIDIA Cloud |
| Phase 5c/5d | Agent skills validation + skill injection + agent turn verification |
| Phase 5e | OpenClaw TUI smoke (expect-driven connect → tui → message → Ctrl+C quit) |
| Phase 5f | Documentation validation — the focus of this issue |

### Related Issues / PRs

- #2333 — existing issue for the docs-drift mismatch
- #2332 — open PR patching check-docs.sh (band-aid fix)
- #1739 — Landlock regression (closed, fixed in OpenShell 0.0.32)
- #2104 — nightly E2E triage plan (references Landlock and cloud-experimental)
- #2367 — PR that disabled cloud-experimental-e2e
- #924 — TypeScript migration / oclif refactor (overlapping scope)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(cli): typed command registry to fix docs-drift and re-enable cloud-experimental-e2e #2388

Description

Problem Statement

Root Cause

Proposed Design

Phase 1: Typed command registry

Phase 2: Generate `help()` from registry

Phase 3: Validate docs from registry

Phase 4: Re-enable cloud-experimental-e2e

Alternatives Considered

What cloud-experimental-e2e uniquely covers

Related Issues / PRs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Phase	Unique Coverage
Phase 3	Full public installer (`curl nvidia.com/nemoclaw.sh \| bash`) with interactive expect-driven onboard
Phase 5 checks	Landlock read-only enforcement (04-landlock-readonly.sh)
Phase 5b	Live cloud chat via inference.local → gateway → NVIDIA Cloud
Phase 5c/5d	Agent skills validation + skill injection + agent turn verification
Phase 5e	OpenClaw TUI smoke (expect-driven connect → tui → message → Ctrl+C quit)
Phase 5f	Documentation validation — the focus of this issue

refactor(cli): typed command registry to fix docs-drift and re-enable cloud-experimental-e2e #2388

Description

Description

Problem Statement

Root Cause

Proposed Design

Phase 1: Typed command registry

Phase 2: Generate help() from registry

Phase 3: Validate docs from registry

Phase 4: Re-enable cloud-experimental-e2e

Alternatives Considered

What cloud-experimental-e2e uniquely covers

Related Issues / PRs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Phase 2: Generate `help()` from registry