fix(runtime): repair stale inference DNS routes#3267
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR prefers kube-dns service IP for DNS upstream with endpoint fallback, adds inference.local health probing and conditional DNS-proxy repair in sandbox connect, and makes onboarding policy preset selection agent-aware with exclusions and tests. ChangesDNS Upstream Discovery and Inference Route Repair
Agent-Aware Policy Preset Filtering
Uninstall Command Branding Fix
Documentation: Inference Route Repair Flag
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant Sandbox
participant Openshell
participant Docker
participant DNSProxy
User->>CLI: run sandbox connect
CLI->>Sandbox: probe gateway/status
CLI->>Openshell: sandbox exec -> GET inference.local/v1/models
Openshell-->>CLI: probe response (200/503)
alt probe unhealthy
CLI->>Docker: kubectl get service kube-dns
Docker-->>CLI: service IP / empty
alt service empty
CLI->>Docker: kubectl get endpoints kube-dns
Docker-->>CLI: endpoint IP
end
CLI->>DNSProxy: runSetupDnsProxy -> repair
DNSProxy-->>CLI: repaired
end
CLI->>User: connect result (non-fatal on repair failures)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 8603-8605: The code computes policyPresetAgentName, allPresets
(via filterPolicyPresetsForAgent(policies.listPresets(), policyPresetAgentName))
and applied (policies.getAppliedPresets(sandboxName)) but later logic can
reintroduce excluded presets into chosen via previously applied or
resume-selected state; update the selection flow so that any source of presets
(applied, resume-selected, or other previously persisted lists) is intersected
with allPresets before building chosen — specifically clamp
policies.getAppliedPresets(sandboxName) and any resume/restore logic to only
include items present in the allPresets set (use resolvePolicyPresetAgentName,
filterPolicyPresetsForAgent, policies.listPresets(), and the chosen assignment
points as anchors).
In `@test/onboard.test.ts`:
- Around line 55-58: The type-guard isOnboardTestInternals currently doesn't
verify the newly added member filterPolicyPresetsForAgent, so objects can be
narrowed while that property is undefined; update isOnboardTestInternals to
check that the candidate has a truthy typeof === "function" for
filterPolicyPresetsForAgent (in addition to existing checks) so that
OnboardTestInternals-typed values truly provide filterPolicyPresetsForAgent
before any code calls it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a748ba7a-d217-4570-bb36-29163776932a
📒 Files selected for processing (10)
src/commands/uninstall.tssrc/lib/actions/dns/index.test.tssrc/lib/actions/dns/index.tssrc/lib/actions/sandbox/connect.tssrc/lib/onboard.tstest/dns-proxy.test.tstest/nemohermes-alias.test.tstest/onboard.test.tstest/policy-tiers-onboard.test.tstest/sandbox-connect-inference.test.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 8588-8607: clampPolicyPresetNames currently filters out presets
solely by name using isPolicyPresetExcludedForAgent, which wrongly drops sandbox
custom presets that share excluded names; update clampPolicyPresetNames to
accept the sandboxName (or otherwise detect custom origin) and build a set of
custom preset names (via policies.listCustomPresets(sandboxName)) so that if a
preset name is a sandbox custom preset it is preserved even if
isPolicyPresetExcludedForAgent(name, agentName) would exclude it; apply the same
change to the other occurrences noted (around the blocks at 8638-8645 and
10319-10326) and ensure callers are updated to pass sandboxName so
syncPresetSelection and any resume/non-interactive code keep user-selected
custom presets.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 40fd1d92-d6b4-44a5-9287-72de0ae91754
📒 Files selected for processing (1)
src/lib/onboard.ts
PR Review NotesTriaged this PR in a fresh worktree. Overall the DNS repair work is solid, but there are a few blockers to resolve before merge — chiefly a supersession question against your own #3223, a rebase against current 🔴 Blockers (must fix before merge)
🟡 Warnings (should fix)
🔵 Suggestions
✅ What's Good
Test depth recommendation: 🔴 E2E requiredThe DNS repair path runs a real-world sequence — Concrete suggested E2E scenario to add before merge:
Trigger via |
🤖 E2E Advisor RecommendationRan the e2e-advisor on this PR from my fork (workflow not yet available upstream). Full run: jyaunches/NemoClaw#25580284587 Recommended E2E jobs to run before mergeHigh-priority (required):
Medium-priority:
Suggested dispatchCoverage gaps flagged (worth adding)
Optional (lower priority)
Generated by the E2E advisor prototype (deterministic + Pi semantic analysis). Static diff analysis only — no PR code executed. |
Preserve custom policy presets when clamping agent exclusions.
Move setup policy preset filtering into the policy domain and drive Brave omission from the web-search support signal instead of an agent-name exclusion table. Preserve sandbox custom presets whose names collide with unsupported built-ins across resume and non-interactive flows. Document the intentional sandbox shell probe shape and the global nemohermes uninstall routing.
|
@jyaunches I went through both of your comments and addressed them point by point. PR Review NotesBlockers
Warnings
Suggestions
E2E Advisor RecommendationI dispatched the exact recommended targeted nightly run on this PR branch:
Run: https://github.com/NVIDIA/NemoClaw/actions/runs/25592421054 Local validation after the reviewer-response patch
|
Selective E2E Results — ✅ All requested jobs passedRun: 25592421054
|
## Summary Refreshes the release-prep docs for v0.0.39 based on changes merged since the Friday 4pm doc refresh. Updates the source docs, bumps the docs version metadata, and regenerates the NemoClaw user skills from the refreshed docs. ## Changes - #3314 -> `docs/get-started/prerequisites.md`, `docs/get-started/quickstart.md`, `docs/reference/troubleshooting.md`: Documents installer Docker setup, Docker group activation, and retry guidance. - #3317 -> `docs/get-started/quickstart.md`, `docs/reference/commands.md`: Documents the DGX Spark and DGX Station express install prompt and `NEMOCLAW_NO_EXPRESS`. - #3328 and #3329 -> `docs/security/best-practices.md`, `docs/deployment/sandbox-hardening.md`: Updates sandbox capability hardening docs for the stricter bounding-set and `setpriv` step-down behavior. - #3330, #3335, and #3346 -> `docs/inference/use-local-inference.md`: Documents Windows-host Ollama relaunch behavior, NIM key passthrough, early health-fail diagnostics, and mixed-GPU preflight detail. - #2406, #2883, #3001, #3244, #3267, #3318, #3320, and #3354 -> `docs/about/release-notes.md`: Adds the v0.0.39 release-prep section while keeping the v0.0.38 release notes intact. - Advances the release-prep docs metadata from v0.0.38 to v0.0.39. - Regenerates `.agents/skills/nemoclaw-user-*` from the updated source docs. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes v0.0.39 * **New Features** * Host alias management commands for easier configuration * Sandbox GPU control options during onboarding * Update command with check and confirmation modes * **Documentation** * Enhanced Linux installer guidance with Docker and group membership handling * Expanded troubleshooting for permission and connectivity issues * Improved capability-dropping security documentation * Updated inference model switching commands * Brev environment-specific troubleshooting * **Improvements** * DGX Spark/Station express install flow * Windows Ollama relay and health-check enhancements * NVIDIA NIM preflight GPU reporting [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3375) <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
inference.localDNS proxy routes during connect/probe when the sandbox has a persisted inference provider/modelnemohermes uninstallas a global uninstall commandNEMOCLAW_DISABLE_INFERENCE_ROUTE_REPAIRas the troubleshooting escape hatch for automatic DNS-proxy repairValidation
npm run build:clinpm run typecheck:clinpx vitest run test/policy-tiers-onboard.test.ts test/onboard-preset-diff.test.tsnpx vitest run test/onboard.test.ts -t "computeSetupPresetSuggestions|agentSupportsWebSearch|configureWebSearch|prints numbered step headers"npx vitest run test/sandbox-connect-inference.test.ts src/lib/actions/dns/index.test.ts test/nemohermes-alias.test.ts test/uninstall.test.ts src/lib/actions/sandbox/oclif-command-adapters.test.ts src/lib/commands/simple-global-oclif-adapters.test.tsgit diff --check