fix(onboard,uninstall): clearer diagnostics when no sandbox is registered#3464
fix(onboard,uninstall): clearer diagnostics when no sandbox is registered#3464laitingsheng wants to merge 4 commits into
Conversation
The "Existing gateway was started without GPU passthrough" suggestion contained the literal `<name>` placeholder and assumed a sandbox was registered; when none was, the suggested `nemoclaw <name> destroy --yes` had nothing to act on and looped the user through `nemoclaw uninstall` without saying so. Also fix the uninstall log line: when an `openshell` cleanup step returned non-zero (typically because the resource was already absent), the warning read `Destroyed gateway 'nemoclaw' skipped`, which contradicts itself. Use a state-form wording so the message reads truthfully. Refs #3456 Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a GPU passthrough recovery helper integrated into onboarding and standardizes uninstall "skipped" warnings by normalizing action prefixes; tests cover recovery guidance variants and uninstall skip reporting edge cases. ChangesGPU Passthrough Recovery Guidance
Uninstall Cleanup Message Standardization
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
E2E Advisor RecommendationRequired E2E: None Dispatch hint: Full advisor summaryPi Semantic E2E AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.test.ts`:
- Around line 11-16: Update the tests to forbid any use of the literal
placeholder "<name>" rather than allowing one exact variant; replace the
specific expectation that contains "No sandboxes... `nemoclaw <name> destroy`"
in the expect(lines).toEqual(...) array with a version that omits the `<name>`
placeholder, and add negative assertions that assert the rendered output does
not contain "<name>" (e.g., expect(lines.join('\n')).not.toMatch(/<name>/) or
similar). Apply the same change to the other occurrence around the 40-43 block
so both the positive expectation arrays and the negative checks uniformly reject
the "<name>" placeholder.
In `@src/lib/onboard.ts`:
- Around line 10417-10426: Extract the inline sandbox-name lookup into a new
helper getRegisteredSandboxNamesForGpuRecovery() in
src/lib/onboard/gpu-recovery.ts that performs the same try/catch behavior around
registry.listSandboxes().sandboxes.map(s => s.name).filter(Boolean) and returns
[] on error; then replace the block in onboard.ts (the current IIFE that assigns
registeredNames) with a simple call to
getRegisteredSandboxNamesForGpuRecovery(), and keep the subsequent loop that
calls gpuPassthroughRecoveryLines(registeredNames) unchanged so only the lookup
logic moves into the new function.
In `@src/lib/onboard/gpu-recovery.ts`:
- Line 8: Update the user-facing message string in
src/lib/onboard/gpu-recovery.ts that currently contains the literal backtick
token `nemoclaw <name> destroy`; remove the placeholder so it reads `nemoclaw
destroy` (i.e., change the string "No sandboxes are registered, so there is
nothing for `nemoclaw <name> destroy` to act on." to "No sandboxes are
registered, so there is nothing for `nemoclaw destroy` to act on.").
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e4b21aec-b0da-470c-89aa-ca45b67d0dda
📒 Files selected for processing (5)
src/lib/actions/uninstall/run-plan.test.tssrc/lib/actions/uninstall/run-plan.tssrc/lib/onboard.test.tssrc/lib/onboard.tssrc/lib/onboard/gpu-recovery.ts
…s neutral CodeRabbit and Codex feedback on #3464: - gpu-recovery still emitted "nemoclaw <name> destroy" in the no-sandbox branch. The whole point of the bug is that `<name>` is not actionable when no sandbox is registered; rewrite the line so the placeholder is gone entirely and broaden the test to forbid any "<name>" occurrence in the joined output. - Hoist the registry lookup + print loop into reportGpuPassthroughRecovery inside gpu-recovery.ts so the entrypoint diff in src/lib/onboard.ts is net-neutral against the budget check. Inject the lookup as a callback for testability and add a fallback case that hides registry-throw paths. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Codex review on #3464 caught three issues: 1) `nemoclaw <name> destroy --yes` explicitly preserves the shared gateway (destroy.ts L207). The previous recovery snippet therefore destroyed the sandbox and then re-onboarded against the same gateway, which still lacks GPU passthrough — same wall, one round later. Append --cleanup-gateway to the last destroy command so the gateway actually goes away. 2) `nemoclaw uninstall` removes the global CLI via `npm uninstall -g nemoclaw` (run-plan.ts L451), so `nemoclaw uninstall && nemoclaw onboard --gpu` is not even executable — the second half runs against a CLI that no longer exists. Use the bare `openshell gateway destroy -g nemoclaw` command (the same one destroy.ts already prints as gatewayRemovalHint) so the CLI survives. 3) The registry catch collapsed read errors to `[]`, which then printed "No sandboxes are registered" — a false diagnosis. Switch the return type to `readonly string[] | null` and add a dedicated branch with a "Could not read the NemoClaw sandbox registry" message and a direct gateway-removal recipe. Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Resolve the two output threads in #3456 left after the core dead-loop fix landed via #3459 + #3434: Sub-bug #3 — `src/lib/onboard.ts` printed `nemoclaw <name> destroy --yes && nemoclaw onboard --gpu` with a literal `<name>` placeholder, and assumed at least one sandbox was registered. When the GPU-passthrough mismatch hit on the State B re-run path with an empty registry (the dead-loop case), the hint was not actionable. Replace with a registry-aware helper at `src/lib/onboard/gpu-recovery.ts` that renders the right shape: - empty registry → suggest `nemoclaw uninstall && nemoclaw onboard --gpu` - one sandbox → suggest destroy --yes --cleanup-gateway for that name - multiple sandboxes → list each, only the last gets --cleanup-gateway Sub-bug #4 — `src/lib/actions/uninstall/run-plan.ts` printed `Destroyed gateway 'nemoclaw' skipped` when the openshell destroy no-op'd (gateway already gone) — the "Destroyed … skipped" wording was self-contradictory. Extend `runOptional` with an `onSkip` option; route the gateway destroy to emit `Gateway 'nemoclaw' already removed or unreachable` on no-op. Tests: - `src/lib/onboard/gpu-recovery.test.ts` (6 tests): forbid literal `<name>` placeholder anywhere in the output; cover empty / single / multi-sandbox cases; defensive filter on whitespace names so a `nemoclaw destroy` rendering can never happen. - `src/lib/actions/uninstall/run-plan.test.ts`: assert the new "already removed or unreachable" wording and the absence of the "Destroyed gateway 'nemoclaw' skipped" string. The core dead loop itself (sub-bugs #1, #2 and State B GPU mismatch) is already addressed by #3459 + #3434 + #3483; #3456 will close once this lands. See the #3456 status comment for the full mapping. Refs #3456. Mirrors (and tightens) the approach in the closed PR #3464, which left the literal `<name>` placeholder in tests per CodeRabbit feedback that was never addressed. Signed-off-by: Charan Jagwani <charjags100@gmail.com>
…3520) > **Draft for visibility.** Issue-autopilot Stages 4-5 of #3456. Will mark ready once batch self-review + CI complete. ## Summary Closes the two remaining output threads in #3456 after the core dead-loop fix already landed on `main` (via #3459, #3434, #3483). Full sub-bug mapping in the [#3456 status comment](#3456 (comment)). - **Sub-bug #3** — `nemoclaw <name> destroy --yes` recovery hint replaced with a registry-aware helper. - **Sub-bug #4** — `Destroyed gateway 'nemoclaw' skipped` self-contradictory wording replaced with `Gateway 'nemoclaw' already removed or unreachable`. ## Acceptance criteria mapping | Sub-bug | Resolution | Evidence | |---|---|---| | #1 dead loop | Already fixed on main (#3459) | out of scope | | #2 firewall diagnostic | Already fixed on main (#3459) | out of scope | | **#3** literal `<name>` placeholder | **This PR** | `src/lib/onboard/gpu-recovery.ts` + `onboard.ts:10387-10405` | | **#4** misleading "skipped" wording | **This PR** | `src/lib/actions/uninstall/run-plan.ts:210-228, 407-414` | | #5 uninstall residuals | Already fixed on main (#3483) | out of scope | ## Behavior matrix `gpuPassthroughRecoveryLines(names)`: | Input | Suggestion | |---|---| | `null` / `[]` | `nemoclaw uninstall && nemoclaw onboard --gpu` | | one sandbox | `nemoclaw <name> destroy --yes --cleanup-gateway && nemoclaw onboard --gpu` | | many sandboxes | each `destroy --yes`, only the last gets `--cleanup-gateway` | ## Test plan ``` npm run typecheck:cli npx vitest run src/lib/onboard/gpu-recovery.test.ts src/lib/actions/uninstall/run-plan.test.ts ``` 22 tests pass (6 new + 16 existing). ## Notes for reviewers - This is the work [#3464 attempted](#3464); that PR was closed without merging after CodeRabbit asked for the `<name>` placeholder to be forbidden in tests via negative assertion. This PR adopts that refinement. - `runOptional` extension is backwards-compatible — existing callers without `onSkip` get the original wording. Closes #3456 once merged. --------- Signed-off-by: Charan Jagwani <charjags100@gmail.com> Co-authored-by: Charan Jagwani <charjags100@gmail.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Summary
Resolve the two misleading-output threads in #3456: the GPU-passthrough recovery suggestion that prints a literal
<name>placeholder and assumes a sandbox is registered, and thenemoclaw uninstalllog line that readsDestroyed gateway 'nemoclaw' skipped(a self-contradiction).Related Issue
Refs #3456 — partially addresses. This PR fixes only the State B "destroy --yes" suggestion and the uninstall "skipped" wording. The State A bridge-reachability / "host firewall is blocking" hint and the underlying GLIBC mismatch are out of scope; the bridge-route work in #3459 covers the install loop, and the GLIBC binary baseline sits on the OpenShell side. Do not auto-close #3456 with this PR alone.
Changes
src/lib/onboard/gpu-recovery.ts: new helper module exposinggpuPassthroughRecoveryLines(names: readonly string[] | null)(pure line-builder) andreportGpuPassthroughRecovery(emit, listRegisteredSandboxes)(registry-lookup + print wrapper with injectable deps for tests). Three branches:nemoclaw <name> destroy --yeswith--cleanup-gatewayappended to the last one (sincedestroy --yesotherwise preserves the gateway and would re-loop on the next onboard).openshell gateway destroy -g nemoclawfollowed bynemoclaw onboard --gpu. Uses the bare gateway-removal command rather thannemoclaw uninstallbecause the latternpm uninstall -g nemoclaws the CLI, after whichnemoclaw onboard --gpuwould no longer exist.Could not read the NemoClaw sandbox registryand the same direct gateway-removal recipe, so a registry failure is not silently rebranded as "no sandboxes registered".src/lib/onboard.ts: when the reusable gateway is missing GPU passthrough, replace the literalnemoclaw <name> destroy --yes && nemoclaw onboard --gpuline with a call toreportGpuPassthroughRecovery(). Diff is net-neutral against the entrypoint-budget check.src/lib/actions/uninstall/run-plan.ts: when anopenshellcleanup step returns non-zero, drop the past-tense verb so the warning readsSkipped <target> (already absent or unavailable)instead ofDestroyed gateway 'nemoclaw' skipped.src/lib/onboard.test.tscovers the four input branches (null/[]/ one / many), asserts--cleanup-gatewaylands only on the last destroy, asserts the joined output never contains<name>or the brokennemoclaw uninstall && nemoclaw onboardchain, and exercisesreportGpuPassthroughRecoveryfor both the happy and registry-throws paths.src/lib/actions/uninstall/run-plan.test.tsadds a case that forces everyopenshellcall non-zero and asserts the warnings use the new wording (and no warning matches/^Destroyed .+ skipped$/).Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)Signed-off-by: Tinson Lai tinsonl@nvidia.com
Summary by CodeRabbit
New Features
Bug Fixes
Tests