feat(cli): add sandbox host alias commands (Fixes #2040)#2406
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds three sandbox-scoped CLI actions— Changes
Sequence Diagram(s)sequenceDiagram
participant User as "User (CLI)"
participant Nemoclaw as "nemoclaw CLI\n(src/nemoclaw.ts)"
participant Docker as "Docker (container exec)"
participant KubeAPI as "kube-apiserver\n(Sandbox CR)"
rect rgba(200,230,201,0.5)
User->>Nemoclaw: run hosts-add / hosts-list / hosts-remove
end
Nemoclaw->>Docker: docker exec openshell-cluster-nemoclaw kubectl get sandbox/<name> -o json
Docker->>KubeAPI: kubectl GET request (Sandbox CR)
KubeAPI-->>Docker: Sandbox CR JSON (includes spec.podTemplate.spec.hostAliases)
Docker-->>Nemoclaw: Sandbox JSON
Nemoclaw->>Nemoclaw: validate hostname/IP, compute new hostAliases list or removal
alt dry-run
Nemoclaw-->>User: print computed JSON patch / preview (no kubectl patch)
else apply
Nemoclaw->>Docker: docker exec ... kubectl patch sandbox/<name> --type=json -p '<patch>'
Docker->>KubeAPI: kubectl PATCH request
KubeAPI-->>Docker: patch response (may indicate conflict)
Docker-->>Nemoclaw: patch result
Nemoclaw-->>User: success or retry on conflict (up to 3 attempts)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
test/cli.test.ts (1)
445-538: Add one regression test for removal and one for duplicate rejection.The new command family adds
hosts-remove, duplicate-hostname validation, and--dry-run, but this file only exercises happy-path add/list. A bug in removal or duplicate handling would still ship green.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/cli.test.ts` around lines 445 - 538, Add regression tests covering hosts-remove and duplicate-hostname rejection (and exercise --dry-run) because current tests only cover add/list; add one test that uses runWithEnv to call "alpha hosts-remove searxng.local" and assert exit code 0, expected docker patch includes removal of the hostname (check log contains "patch" and the JSON patch from log shows the hostname removed from spec.podTemplate.spec.hostAliases), and another test that calls "alpha hosts-add searxng.local 192.168.1.105" twice (or calls hosts-add with an existing hostname) and asserts a non-zero exit code and an error message about duplicate hostnames (and verify no docker patch applied); also add a short test asserting that using "--dry-run" with hosts-add/hosts-remove returns success and prints the intended patch without invoking docker side effects (inspect docker log or absence thereof) — use runWithEnv, the CLI commands "hosts-remove" and "hosts-add", the "--dry-run" flag, and the existing log inspection pattern to locate JSON patches in the docker log.docs/reference/commands.md (1)
375-399: FormathostAliasesas code and keep “sandbox” lowercase in prose.Line 375 uses
Sandboxin prose, and Lines 379 and 399 mentionhostAliaseswithout inline code formatting. As per coding guidelines, "sandbox(unless starting a sentence)" and "CLI commands, file paths, flags, parameter names, and values must use inlinecodeformatting."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/reference/commands.md` around lines 375 - 399, Update the prose to use lowercase "sandbox" (unless it begins a sentence) and wrap all occurrences of hostAliases in inline code formatting; specifically change the standalone "Sandbox" mention to "sandbox" and replace bare hostAliases references with `hostAliases`, and ensure CLI usages and flags like `nemoclaw my-assistant hosts-list`, `nemoclaw my-assistant hosts-remove searxng.local`, and `--dry-run` are written with inline code formatting for consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/nemoclaw.ts`:
- Around line 2210-2227: getSandboxResource currently calls execFileSync
directly which can hang or throw; wrap the call to execFileSync in a try/catch,
pass a timeout option (e.g. { timeout: <ms>, encoding: 'utf-8', stdio: [...] })
to avoid indefinite hangs, and handle errors from execFileSync by converting
them into a controlled CLI error path (return null or throw a custom Error with
a descriptive message and include the original error.stdout/stderr when
available). Apply the same pattern wherever execFileSync is used in this file so
callers receive a predictable error instead of an uncaught exception or a hung
process.
- Around line 2280-2318: The add/remove paths are comparing raw host strings
which allows mixed-case duplicates; normalize hostnames to lowercase before
validation, comparison, and storage. In sandboxHostsAdd: immediately set
hostname = hostname.toLowerCase() (or rebuild aliases with alias.hostnames =
alias.hostnames.map(h => h.toLowerCase())) before calling
validateHostAliasHostname and before the aliases.some / existing checks, and
push the lowercased hostname when adding to an alias; ensure patchHostAliases
receives the normalized hostnames. Apply the same lowercase normalization logic
in the hosts-remove implementation (the code around the corresponding remove
function) so comparisons and removals use the same normalized form.
---
Nitpick comments:
In `@docs/reference/commands.md`:
- Around line 375-399: Update the prose to use lowercase "sandbox" (unless it
begins a sentence) and wrap all occurrences of hostAliases in inline code
formatting; specifically change the standalone "Sandbox" mention to "sandbox"
and replace bare hostAliases references with `hostAliases`, and ensure CLI
usages and flags like `nemoclaw my-assistant hosts-list`, `nemoclaw my-assistant
hosts-remove searxng.local`, and `--dry-run` are written with inline code
formatting for consistency.
In `@test/cli.test.ts`:
- Around line 445-538: Add regression tests covering hosts-remove and
duplicate-hostname rejection (and exercise --dry-run) because current tests only
cover add/list; add one test that uses runWithEnv to call "alpha hosts-remove
searxng.local" and assert exit code 0, expected docker patch includes removal of
the hostname (check log contains "patch" and the JSON patch from log shows the
hostname removed from spec.podTemplate.spec.hostAliases), and another test that
calls "alpha hosts-add searxng.local 192.168.1.105" twice (or calls hosts-add
with an existing hostname) and asserts a non-zero exit code and an error message
about duplicate hostnames (and verify no docker patch applied); also add a short
test asserting that using "--dry-run" with hosts-add/hosts-remove returns
success and prints the intended patch without invoking docker side effects
(inspect docker log or absence thereof) — use runWithEnv, the CLI commands
"hosts-remove" and "hosts-add", the "--dry-run" flag, and the existing log
inspection pattern to locate JSON patches in the docker log.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 98b468fa-caf4-4daa-861e-3e526ddbe997
📒 Files selected for processing (3)
docs/reference/commands.mdsrc/nemoclaw.tstest/cli.test.ts
d7cb1cc to
ddb6de5
Compare
|
Hardened the host-alias commands with kubectl timeouts/errors, lowercase hostname normalization, remove/duplicate/dry-run coverage, and the docs formatting cleanup. Reran npm run build:cli, npm run typecheck:cli, and npm test -- test/cli.test.ts. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/nemoclaw.ts`:
- Around line 2234-2272: The current read-modify-write in
getSandboxResource/getHostAliases/patchHostAliases can clobber concurrent
updates because patchHostAliases sends the full hostAliases array; change to an
optimistic-concurrency retry or targeted patch: when fetching the sandbox via
getSandboxResource also capture metadata.resourceVersion, then when updating
call kubectl patch with either a JSON Patch that tests the captured
resourceVersion first (or a strategic merge patch that modifies only the
specific hostAliases entry) and retry the read-modify-patch cycle on conflict
(loop a few times) until success; update references in getSandboxResource,
getHostAliases and patchHostAliases (and the other occurrences noted) to use
resource.metadata.resourceVersion and a test+replace JSON Patch or per-entry
merge so you don't overwrite unrelated concurrent changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c9b6d999-e5a3-428b-a5f0-5d167e851761
📒 Files selected for processing (3)
docs/reference/commands.mdsrc/nemoclaw.tstest/cli.test.ts
✅ Files skipped from review due to trivial changes (1)
- docs/reference/commands.md
🚧 Files skipped from review as they are similar to previous changes (1)
- test/cli.test.ts
ddb6de5 to
1afd7ca
Compare
|
Added resource-version guarded JSON patching for host aliases, made dry-run print the same patch shape, and covered the retry path. Reran npm run build:cli, npm run typecheck:cli, and npm test -- test/cli.test.ts. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/nemoclaw.ts (1)
2215-2252: Consolidate duplicated kubectl execution wrappers.
runKubectlInClusterandrunKubectlInClusterForPatchduplicate command construction/options. This is easy to drift over time (timeouts, stdio, namespace, container).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nemoclaw.ts` around lines 2215 - 2252, The two functions runKubectlInCluster and runKubectlInClusterForPatch duplicate execFileSync invocation and options; extract a single helper (e.g., runKubectlInClusterBase or runKubectlInClusterWithBehavior) that constructs and calls execFileSync using K3S_CONTAINER, namespace "openshell", and HOST_ALIAS_KUBECTL_TIMEOUT_MS with the shared stdio/encoding/timeout object, and accept a behavior flag or callback to control error handling; then make runKubectlInCluster call the new helper and on catch call throwKubectlError(action, error), and make runKubectlInClusterForPatch call the new helper but rethrow the error (or return the raw exec result) so existing semantics remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/nemoclaw.ts`:
- Around line 2359-2363: The hosts-add/hosts-remove flag parsing currently
strips all args starting with "-" which silently ignores typos; update the
parsing around args/dryRun/values (the block that sets const dryRun =
args.includes("--dry-run"); const values = args.filter((arg) =>
!arg.startsWith("-")); const [rawHostname, ip] = values) to explicitly validate
flags: compute unknownFlags = args.filter(a => a.startsWith("-") && a !==
"--dry-run"), and if unknownFlags.length > 0 print a clear usage/error message
and exit non‑zero; keep the existing dryRun behavior and the same hostname/ip
validation, and apply the identical validation change to the analogous
hosts-remove block around lines 2403-2407 so unknown flags are rejected for both
commands.
---
Nitpick comments:
In `@src/nemoclaw.ts`:
- Around line 2215-2252: The two functions runKubectlInCluster and
runKubectlInClusterForPatch duplicate execFileSync invocation and options;
extract a single helper (e.g., runKubectlInClusterBase or
runKubectlInClusterWithBehavior) that constructs and calls execFileSync using
K3S_CONTAINER, namespace "openshell", and HOST_ALIAS_KUBECTL_TIMEOUT_MS with the
shared stdio/encoding/timeout object, and accept a behavior flag or callback to
control error handling; then make runKubectlInCluster call the new helper and on
catch call throwKubectlError(action, error), and make
runKubectlInClusterForPatch call the new helper but rethrow the error (or return
the raw exec result) so existing semantics remain unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 97a9f190-d76d-425b-8e17-e0ad537f387b
📒 Files selected for processing (3)
docs/reference/commands.mdsrc/nemoclaw.tstest/cli.test.ts
✅ Files skipped from review due to trivial changes (1)
- docs/reference/commands.md
|
✨ Thanks for submitting this pull request that proposes a way to add sandbox host alias commands to the NemoClaw CLI. This identifies an enhancement to the CLI and proposes a change to manage host aliases, which will improve the usability of the sandbox resource. Related open issues: |
1afd7ca to
78fc006
Compare
|
Rebased on latest main and resolved the command-registry conflict. Host alias help now flows through the registry, kubectl execution uses one wrapper, and unknown host flags fail fast. npm run build:cli and npm test -- test/cli.test.ts pass. |
78fc006 to
1525887
Compare
|
Rebased on latest main and resolved the CLI/test conflicts. npm run build:cli and npm test -- test/cli.test.ts pass. |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/nemoclaw.ts (1)
2946-2948: Optional: inline the patch raw-call helper to reduce indirection.
runKubectlInClusterForPatch()currently forwards directly torunKubectlInClusterRaw()without adding behavior. You can inline this call inpatchHostAliases()for simpler flow.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nemoclaw.ts` around lines 2946 - 2948, runKubectlInClusterForPatch is a trivial forwarder to runKubectlInClusterRaw and should be inlined to reduce indirection: replace calls to runKubectlInClusterForPatch(...) inside patchHostAliases with direct calls to runKubectlInClusterRaw(...), then remove the runKubectlInClusterForPatch function declaration; ensure any callers (patchHostAliases) keep the same argument shape and behavior when switching to runKubectlInClusterRaw.test/cli.test.ts (1)
803-905: Reduce duplicated registry fixture setup in host-alias tests
adds host aliases...andlists host aliases...still inline the same sandbox-registry JSON setup instead of reusingwriteHostAliasRegistry, which increases maintenance cost.♻️ Suggested simplification
- const registryDir = path.join(home, ".nemoclaw"); fs.mkdirSync(localBin, { recursive: true }); - fs.mkdirSync(registryDir, { recursive: true }); - fs.writeFileSync( - path.join(registryDir, "sandboxes.json"), - JSON.stringify({ - sandboxes: { - alpha: { - name: "alpha", - model: "test-model", - provider: "nvidia-prod", - gpuEnabled: false, - policies: [], - }, - }, - defaultSandbox: "alpha", - }), - { mode: 0o600 }, - ); + writeHostAliasRegistry(home);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/cli.test.ts` around lines 803 - 905, Both tests duplicate the same sandbox registry JSON creation; replace the inline fs.writeFileSync registry setup in the "adds host aliases..." and "lists host aliases..." tests with a call to the helper writeHostAliasRegistry to reuse the fixture. Concretely, remove the block that writes sandboxes.json (the fs.writeFileSync targeting path.join(registryDir, "sandboxes.json")) and instead call writeHostAliasRegistry(home, "alpha") (or the existing writeHostAliasRegistry signature used elsewhere in the test suite) before running runWithEnv, keeping the rest of each test (local docker stub, runWithEnv, and assertions) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/nemoclaw.ts`:
- Around line 2946-2948: runKubectlInClusterForPatch is a trivial forwarder to
runKubectlInClusterRaw and should be inlined to reduce indirection: replace
calls to runKubectlInClusterForPatch(...) inside patchHostAliases with direct
calls to runKubectlInClusterRaw(...), then remove the
runKubectlInClusterForPatch function declaration; ensure any callers
(patchHostAliases) keep the same argument shape and behavior when switching to
runKubectlInClusterRaw.
In `@test/cli.test.ts`:
- Around line 803-905: Both tests duplicate the same sandbox registry JSON
creation; replace the inline fs.writeFileSync registry setup in the "adds host
aliases..." and "lists host aliases..." tests with a call to the helper
writeHostAliasRegistry to reuse the fixture. Concretely, remove the block that
writes sandboxes.json (the fs.writeFileSync targeting path.join(registryDir,
"sandboxes.json")) and instead call writeHostAliasRegistry(home, "alpha") (or
the existing writeHostAliasRegistry signature used elsewhere in the test suite)
before running runWithEnv, keeping the rest of each test (local docker stub,
runWithEnv, and assertions) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8b1474f7-7ac6-4ccd-84fd-97ba49ce5e90
📒 Files selected for processing (4)
docs/reference/commands.mdsrc/lib/command-registry.tssrc/nemoclaw.tstest/cli.test.ts
✅ Files skipped from review due to trivial changes (1)
- docs/reference/commands.md
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib/command-registry.ts
1525887 to
d4ed1d3
Compare
|
Cleaned up the last CodeRabbit nits on the host-alias branch. npm run build:cli and npm test -- test/cli.test.ts pass. |
d4ed1d3 to
7d3c6f2
Compare
|
Rebased on latest main and tightened the host-alias retry test for strict typecheck. npm run build:cli, npm run typecheck:cli, and npm test -- test/cli.test.ts pass. |
Fixes NVIDIA#2040 Signed-off-by: Deepak Jain <deepujain@gmail.com>
7d3c6f2 to
87e4fc9
Compare
|
Rebased onto current main and resolved the TS/CLI conflicts in the host-alias branch. Build, CLI typecheck, and the focused CLI test file pass locally. |
## Summary Refreshes the release-prep docs for v0.0.39 based on changes merged since the Friday 4pm doc refresh. Updates the source docs, bumps the docs version metadata, and regenerates the NemoClaw user skills from the refreshed docs. ## Changes - #3314 -> `docs/get-started/prerequisites.md`, `docs/get-started/quickstart.md`, `docs/reference/troubleshooting.md`: Documents installer Docker setup, Docker group activation, and retry guidance. - #3317 -> `docs/get-started/quickstart.md`, `docs/reference/commands.md`: Documents the DGX Spark and DGX Station express install prompt and `NEMOCLAW_NO_EXPRESS`. - #3328 and #3329 -> `docs/security/best-practices.md`, `docs/deployment/sandbox-hardening.md`: Updates sandbox capability hardening docs for the stricter bounding-set and `setpriv` step-down behavior. - #3330, #3335, and #3346 -> `docs/inference/use-local-inference.md`: Documents Windows-host Ollama relaunch behavior, NIM key passthrough, early health-fail diagnostics, and mixed-GPU preflight detail. - #2406, #2883, #3001, #3244, #3267, #3318, #3320, and #3354 -> `docs/about/release-notes.md`: Adds the v0.0.39 release-prep section while keeping the v0.0.38 release notes intact. - Advances the release-prep docs metadata from v0.0.38 to v0.0.39. - Regenerates `.agents/skills/nemoclaw-user-*` from the updated source docs. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes v0.0.39 * **New Features** * Host alias management commands for easier configuration * Sandbox GPU control options during onboarding * Update command with check and confirmation modes * **Documentation** * Enhanced Linux installer guidance with Docker and group membership handling * Expanded troubleshooting for permission and connectivity issues * Improved capability-dropping security documentation * Updated inference model switching commands * Brev environment-specific troubleshooting * **Improvements** * DGX Spark/Station express install flow * Windows Ollama relay and health-check enhancements * NVIDIA NIM preflight GPU reporting [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3375) <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Operators currently have to drop to
docker execandkubectl patchto manage sandboxhostAliases. This adds a supported NemoClaw CLI surface for that workflow.The new commands let users add, list, and remove host aliases on the Sandbox resource, with validation for hostnames, IP addresses, duplicates, and dry-run output.
Changes
src/nemoclaw.ts: addhosts-add,hosts-list, andhosts-removesandbox actions backed bykubectl patchthrough the NemoClaw k3s container.test/cli.test.ts: cover host alias patch construction and list output.docs/reference/commands.md: document the new commands and--dry-runbehavior.Testing
npm run build:clipassednpm run typecheck:clipassednpm test -- test/cli.test.tspassednpm testwas also attempted. The full suite is not green on current main in this environment; failures are in existing installer/onboard/legacy-guard tests outside this CLI addition.Evidence it works
The CLI test stubs the same
docker exec openshell-cluster-nemoclaw kubectl ...path used by existing host-side sandbox helpers, then verifies thathosts-addwrites the expected merge patch tospec.podTemplate.spec.hostAliasesandhosts-listprints configured aliases.Fixes #2040
Signed-off-by: Deepak Jain deepujain@gmail.com
Summary by CodeRabbit
New Features
Documentation
Tests