fix(runtime): repair stale inference DNS routes by ericksoa · Pull Request #3267 · NVIDIA/NemoClaw

ericksoa · 2026-05-08T14:07:09Z

Summary

repair stale sandbox inference.local DNS proxy routes during connect/probe when the sandbox has a persisted inference provider/model
prefer the stable kube-dns service IP for the sandbox DNS proxy, with CoreDNS endpoint fallback
route nemohermes uninstall as a global uninstall command
omit the Brave policy preset when the selected sandbox image does not support NemoClaw's Brave web-search path, while preserving OpenClaw behavior and sandbox custom presets with colliding names
document NEMOCLAW_DISABLE_INFERENCE_ROUTE_REPAIR as the troubleshooting escape hatch for automatic DNS-proxy repair

Validation

npm run build:cli
npm run typecheck:cli
npx vitest run test/policy-tiers-onboard.test.ts test/onboard-preset-diff.test.ts
npx vitest run test/onboard.test.ts -t "computeSetupPresetSuggestions|agentSupportsWebSearch|configureWebSearch|prints numbered step headers"
npx vitest run test/sandbox-connect-inference.test.ts src/lib/actions/dns/index.test.ts test/nemohermes-alias.test.ts test/uninstall.test.ts src/lib/actions/sandbox/oclif-command-adapters.test.ts src/lib/commands/simple-global-oclif-adapters.test.ts
git diff --check

coderabbitai · 2026-05-08T14:07:28Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR prefers kube-dns service IP for DNS upstream with endpoint fallback, adds inference.local health probing and conditional DNS-proxy repair in sandbox connect, and makes onboarding policy preset selection agent-aware with exclusions and tests.

Changes

DNS Upstream Discovery and Inference Route Repair

Layer / File(s)	Summary
DNS Upstream Discovery Logic `src/lib/actions/dns/index.ts`	`runSetupDnsProxy` now queries kube-dns service `clusterIP` first and falls back to the first endpoint IP if the service lookup is empty/`none`.
DNS Upstream Unit Tests `src/lib/actions/dns/index.test.ts`	Tests updated to validate service-IP preference, endpoint fallback when service is empty, and unsafe upstream rejection sourced from the service lookup.
DNS Shell Integration Tests `test/dns-proxy.test.ts`	E2E shell test now asserts `kubectl get service kube-dns` is called and `kubectl get endpoints kube-dns` is not called.
Inference Route Constants `src/lib/actions/sandbox/connect.ts`	Adds `NEMOCLAW_GATEWAY_NAME` and `INFERENCE_ROUTE_PROBE_TIMEOUT_MS` constants.
Inference Route Verification and Repair Helpers `src/lib/actions/sandbox/connect.ts`	New helpers probe `inference.local/v1/models`, optionally run DNS-proxy repair (env-disableable), reconcile live vs persisted inference config, and attempt `inference set --no-verify`; failures are non-fatal.
Inference Route Connect Integration `src/lib/actions/sandbox/connect.ts`	Sandbox connect probe now calls `ensureSandboxInferenceRoute(sandboxName)` in both success and failure paths and replaces prior inline swapping logic.
Inference Route Integration Tests `test/sandbox-connect-inference.test.ts`	Test harness extended with fake openshell/docker stubs and state tracking; new test verifies DNS-proxy repair triggers when inference-local returns 503 during connect.

Agent-Aware Policy Preset Filtering

Layer / File(s)	Summary
Preset Filtering Logic and Exclusion Map `src/lib/onboard.ts`	Introduces `AGENT_POLICY_PRESET_EXCLUSIONS`, `filterPolicyPresetsForAgent`, and `resolvePolicyPresetAgentName` to normalize agent name and filter presets.
Preset Suggestion Refinement `src/lib/onboard.ts`	`computeSetupPresetSuggestions` accepts optional `knownPresetNames` allowlist and filters tier defaults accordingly.
Setup Flow Agent Integration `src/lib/onboard.ts`	`setupPoliciesWithSelection` gains `agentName` option and uses an agent-filtered preset universe; onboarding resume passes `agentName` into setup.
Agent Filter Unit Tests `test/onboard.test.ts`	Tests add `filterPolicyPresetsForAgent` helper and verify knownPresetNames and agent-specific exclusions (e.g., Hermes excludes `brave`).
Agent Filter Integration Tests `test/policy-tiers-onboard.test.ts`	Integration tests validate Hermes excludes `brave`, removes it if present, and clamps resume selections to allowed presets; also test preserving custom `brave`.
Public API Export `src/lib/onboard.ts`	`filterPolicyPresetsForAgent` exported from module.

Uninstall Command Branding Fix

Layer / File(s)	Summary
Command Usage Metadata `src/commands/uninstall.ts`	Uninstall command `usage` string hardcoded to `"nemoclaw uninstall"` (removed `CLI_NAME` usage).
Uninstall Help Text Routing Test `test/nemohermes-alias.test.ts`	New test ensures `nemohermes uninstall --help` routes to Hermes uninstaller and shows expected help text.

Documentation: Inference Route Repair Flag

Layer / File(s)	Summary
Commands Reference Docs `.agents/skills/nemoclaw-user-reference/references/commands.md`, `docs/reference/commands.md`	Documents `NEMOCLAW_DISABLE_INFERENCE_ROUTE_REPAIR` to skip automatic DNS-proxy repair for `inference.local` during connect flows.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant CLI
  participant Sandbox
  participant Openshell
  participant Docker
  participant DNSProxy
  User->>CLI: run sandbox connect
  CLI->>Sandbox: probe gateway/status
  CLI->>Openshell: sandbox exec -> GET inference.local/v1/models
  Openshell-->>CLI: probe response (200/503)
  alt probe unhealthy
    CLI->>Docker: kubectl get service kube-dns
    Docker-->>CLI: service IP / empty
    alt service empty
      CLI->>Docker: kubectl get endpoints kube-dns
      Docker-->>CLI: endpoint IP
    end
    CLI->>DNSProxy: runSetupDnsProxy -> repair
    DNSProxy-->>CLI: repaired
  end
  CLI->>User: connect result (non-fatal on repair failures)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Soft paws tap on config keys,

DNS first, then endpoints please,
When inference sighs and answers 503,
I stitch its route and let it be,
Presets hop home — brave skips Hermes' tree.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.56% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: repairing stale inference DNS routes during sandbox connect/probe operations, which is the primary technical objective across multiple file changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/inference-local-dns-recovery

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 8603-8605: The code computes policyPresetAgentName, allPresets
(via filterPolicyPresetsForAgent(policies.listPresets(), policyPresetAgentName))
and applied (policies.getAppliedPresets(sandboxName)) but later logic can
reintroduce excluded presets into chosen via previously applied or
resume-selected state; update the selection flow so that any source of presets
(applied, resume-selected, or other previously persisted lists) is intersected
with allPresets before building chosen — specifically clamp
policies.getAppliedPresets(sandboxName) and any resume/restore logic to only
include items present in the allPresets set (use resolvePolicyPresetAgentName,
filterPolicyPresetsForAgent, policies.listPresets(), and the chosen assignment
points as anchors).

In `@test/onboard.test.ts`:
- Around line 55-58: The type-guard isOnboardTestInternals currently doesn't
verify the newly added member filterPolicyPresetsForAgent, so objects can be
narrowed while that property is undefined; update isOnboardTestInternals to
check that the candidate has a truthy typeof === "function" for
filterPolicyPresetsForAgent (in addition to existing checks) so that
OnboardTestInternals-typed values truly provide filterPolicyPresetsForAgent
before any code calls it.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a748ba7a-d217-4570-bb36-29163776932a

📥 Commits

Reviewing files that changed from the base of the PR and between b1320d5 and 9610b04.

📒 Files selected for processing (10)

src/commands/uninstall.ts
src/lib/actions/dns/index.test.ts
src/lib/actions/dns/index.ts
src/lib/actions/sandbox/connect.ts
src/lib/onboard.ts
test/dns-proxy.test.ts
test/nemohermes-alias.test.ts
test/onboard.test.ts
test/policy-tiers-onboard.test.ts
test/sandbox-connect-inference.test.ts

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 8588-8607: clampPolicyPresetNames currently filters out presets
solely by name using isPolicyPresetExcludedForAgent, which wrongly drops sandbox
custom presets that share excluded names; update clampPolicyPresetNames to
accept the sandboxName (or otherwise detect custom origin) and build a set of
custom preset names (via policies.listCustomPresets(sandboxName)) so that if a
preset name is a sandbox custom preset it is preserved even if
isPolicyPresetExcludedForAgent(name, agentName) would exclude it; apply the same
change to the other occurrences noted (around the blocks at 8638-8645 and
10319-10326) and ensure callers are updated to pass sandboxName so
syncPresetSelection and any resume/non-interactive code keep user-selected
custom presets.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 40fd1d92-d6b4-44a5-9287-72de0ae91754

📥 Commits

Reviewing files that changed from the base of the PR and between 5c91b8d and 3df09bd.

📒 Files selected for processing (1)

src/lib/onboard.ts

jyaunches · 2026-05-08T21:32:49Z

PR Review Notes

Triaged this PR in a fresh worktree. Overall the DNS repair work is solid, but there are a few blockers to resolve before merge — chiefly a supersession question against your own #3223, a rebase against current main, and one custom-preset correctness bug.

🔴 Blockers (must fix before merge)

Merge conflicts with main. mergeStateStatus: DIRTY, branch is 4 commits behind origin/main. Rebase is needed, and in particular please verify that hasChatCompletionsToolCall / requireChatCompletionsToolCalling: true on the Ollama validation path (added by fix: complete #2731, #2697, and #2727 reliability fixes #2737) survives the rebase — the two-dot diff currently shows them being "removed" as an artifact of the stale base.
Supersession vs. PR fix: skip Brave policy preset for unsupported agents #3223. fix: skip Brave policy preset for unsupported agents #3223 targets the same Hermes-excludes-Brave problem with a different, more extensible, capability-based approach (webSearchSupported: boolean threaded through computeSetupPresetSuggestions / setupPoliciesWithSelection). This PR uses a hardcoded AGENT_POLICY_PRESET_EXCLUSIONS: { hermes: ["brave"] } table. Both cannot land. My recommendation: take fix: skip Brave policy preset for unsupported agents #3223's capability-based API and drop the exclusion table from this PR, so adding a new agent that doesn't support Brave is "set agentSupportsWebSearch to false" rather than "edit a CLI-side agent-name table."
clampPolicyPresetNames silently drops custom presets whose name collides with an exclusion. listPolicyPresetsForAgent() intentionally keeps policies.listCustomPresets(sandboxName), but clampPolicyPresetNames() filters purely by name — so a user's custom brave preset in a Hermes sandbox is interactively selectable, then wiped by resume / non-interactive paths. clampPolicyPresetNames needs to be source-aware (accept sandboxName and skip exclusion for names that are registered custom presets). CodeRabbit flagged the same issue.

🟡 Warnings (should fix)

src/lib/onboard.ts monolith growth: +69 net lines (10,488 → 10,557). This is over the +20-line push-back threshold we're holding refactoring PRs to. The new helpers — AGENT_POLICY_PRESET_EXCLUSIONS, isPolicyPresetExcludedForAgent, filterPolicyPresetsForAgent, resolvePolicyPresetAgentName, listPolicyPresetsForAgent, clampPolicyPresetNames — are a coherent policy-domain concept and belong in src/lib/policies.ts or a new src/lib/policies/agent-exclusions.ts with co-located tests. They're extraction-ready. Ref Architectural improvement: Extract rebuild recreate path from onboard monolith and canonicalize credential resolution #2306.
Exclusions can leak back in reuse / resume flows. allPresets is filtered at the top, but chosen can be re-populated from previously-applied state or resume-selected presets before the final syncPresetSelection call, so Hermes may end up re-applying brave across a resume. Either clamp at every mutation site of chosen, or clamp once at the bottom just before the sync call. (Also noted by CodeRabbit.)
test/onboard.test.ts:58 — isOnboardTestInternals runtime guard not updated for the new filterPolicyPresetsForAgent member. The type and destructuring were updated but the typeof value.filterPolicyPresetsForAgent === "function" check is missing from the narrowing guard. If the module ever stops exporting it, the test crashes at use-site instead of failing the guard clearly.
src/lib/actions/sandbox/connect.ts — new sh -c "..." shell-string probe. Not a security regression (all args are literal), but it's a new shell-string callsite for fix(security): migrate remaining shell-string callsites to argv arrays #1889. Please either factor isSandboxInferenceRouteHealthy into an argv form or leave a comment explaining why the case / output capture makes sh -c unavoidable here.
Scope creep. The PR title is DNS-only but the branch contains three concerns: DNS repair, Hermes/Brave preset clamping, and nemohermes uninstall routing. At minimum the uninstall commit is unrelated and would merge faster as its own PR. The DNS repair commit (9610b04a2) is the cleanest piece here and could land on its own tomorrow if separated from the policy-preset work.
src/commands/uninstall.ts — CLI_NAME replaced with literal "nemoclaw uninstall". If the intent is to force NemoClaw branding even under the nemohermes alias, that should be justified in a comment. As written it reads like a regression against fix: brand NemoHermes uninstall goodbye #3220's branding work.

🔵 Suggestions

DNS upstream discovery — consider also accepting kube-dns / coredns in namespaces other than kube-system for future flexibility. Not blocking; matches existing endpoints-fallback scope.
Document NEMOCLAW_DISABLE_INFERENCE_ROUTE_REPAIR. Good escape hatch; please mention it in the PR body and a docs file so operators can discover it.
Colocate INFERENCE_ROUTE_PROBE_TIMEOUT_MS = 10_000 with OPENSHELL_PROBE_TIMEOUT_MS in openshell/timeouts.ts. Makes future timeout tuning one-file.

✅ What's Good

The DNS preference (kube-dns service ClusterIP → CoreDNS endpoint IP → 8.8.8.8) is a real, defensible improvement — service IPs survive pod rotation, endpoint IPs don't.
The behavioral test in test/sandbox-connect-inference.test.ts is excellent: it stubs docker + openshell, uses a real execFileSync of the connect path, and asserts on both docker-call sequencing (new path preferred, fallback not taken) and user-visible log strings. Right level of test for this logic short of full E2E.
ensureSandboxInferenceRoute extraction is clean — one function called at all three probe outcomes (running / recovered / failed), with consistent quiet handling. Good internal refactor even without the agent-exclusion work.
The repair → re-probe → succeed-or-warn flow fails open with a clear warning instead of blocking connect. Correct choice for a DNS-recovery fix.

Test depth recommendation: 🔴 E2E required

The DNS repair path runs a real-world sequence — openshell sandbox exec → sh -c → curl → inference.local → DNS proxy repair → re-probe. The stubbed behavioral test proves the CLI makes the right calls in the right order, but it cannot prove the repair actually fixes the broken DNS route on a real cluster — which is the whole point of the change.

Concrete suggested E2E scenario to add before merge:

Create a sandbox, run nemoclaw sandbox connect once to confirm inference.local works.
Artificially break the DNS proxy (e.g., rewrite /etc/resolv.conf inside the sandbox, or restart DNS proxy with a bogus upstream).
Run nemoclaw sandbox connect again.
Assert stdout contains inference.local is unavailable inside '<name>'. Repairing sandbox DNS proxy... followed by inference.local route repaired., and that a subsequent openshell sandbox exec -- curl -sk https://inference.local/v1/models returns HTTP 200.
Negative case: set NEMOCLAW_DISABLE_INFERENCE_ROUTE_REPAIR=1 and verify repair is skipped.

Trigger via nightly-dispatch.yml with a test-filter targeting the new scenario, or run locally on sparky.

jyaunches · 2026-05-08T21:32:57Z

🤖 E2E Advisor Recommendation

Ran the e2e-advisor on this PR from my fork (workflow not yet available upstream). Full run: jyaunches/NemoClaw#25580284587

Recommended E2E jobs to run before merge

High-priority (required):

sandbox-operations-e2e — exercises the new inference.local probe + DNS-proxy auto-repair on connect / --probe-only (this is the core change)
hermes-e2e — covers the new AGENT_POLICY_PRESET_EXCLUSIONS map (hermes drops brave)
cloud-onboard-e2e — policy preset validation + resume reapply path changed

Medium-priority:

inference-routing-e2e — DNS proxy upstream selection + inference route swap both affect routing/credential isolation
cloud-inference-e2e — the kube-dns service → endpoints change alters the forwarder's upstream; could silently break inference.local resolution

Suggested dispatch

Workflow: .github/workflows/nightly-e2e.yaml
jobs: sandbox-operations-e2e,inference-routing-e2e,cloud-inference-e2e,hermes-e2e,cloud-onboard-e2e

Coverage gaps flagged (worth adding)

sandbox-connect DNS-repair path — no existing E2E exercises it. Suggested scenario: start a sandbox, disrupt the in-sandbox DNS proxy (kill /tmp/dns-proxy.py or clobber /etc/resolv.conf), run nemoclaw <name> connect --probe-only, and assert curl https://inference.local/v1/models recovers.
hermes-onboard preset exclusion — AGENT_POLICY_PRESET_EXCLUSIONS has no dedicated test. A future agent added to the map or a regression removing the filter would silently re-allow disallowed presets. Suggested: non-interactive Hermes onboard with NEMOCLAW_POLICY_PRESETS=brave,pypi,npm and assert brave is dropped from the Hermes sandbox but still allowed on an openclaw onboard.

Optional (lower priority)

onboard-resume-e2e, sandbox-survival-e2e, issue-2478-crash-loop-recovery-e2e, hermes-discord-e2e, docs-validation-e2e

Generated by the E2E advisor prototype (deterministic + Pi semantic analysis). Static diff analysis only — no PR code executed.

Preserve custom policy presets when clamping agent exclusions.

Move setup policy preset filtering into the policy domain and drive Brave omission from the web-search support signal instead of an agent-name exclusion table. Preserve sandbox custom presets whose names collide with unsupported built-ins across resume and non-interactive flows. Document the intentional sandbox shell probe shape and the global nemohermes uninstall routing.

ericksoa · 2026-05-09T05:04:53Z

@jyaunches I went through both of your comments and addressed them point by point.

PR Review Notes

Blockers

Merge conflicts with current main
Resolved. I merged current origin/main into the PR branch in 9a3739336. The only conflict surface was the test/onboard.test.ts internals guard, and the resolved guard keeps all three relevant checks: hasChatCompletionsToolCall, hasChatCompletionsToolCallLeak, and filterSetupPolicyPresets. The PR is mergeable against current main after the merge.
Supersession vs. fix: skip Brave policy preset for unsupported agents #3223
Addressed in a0426e414. I took the fix: skip Brave policy preset for unsupported agents #3223 direction and removed the hardcoded AGENT_POLICY_PRESET_EXCLUSIONS / agent-name filtering table from this PR. Brave is now filtered from setup policy presets through the existing agentSupportsWebSearch(...) capability signal, threaded as webSearchSupported into computeSetupPresetSuggestions(...) and setupPoliciesWithSelection(...). This means the decision is capability-based rather than tied to a CLI-side agent-name table.

I also moved the setup preset support helpers into the policy domain in src/lib/policies.ts: setupPolicyPresetSupported, filterSetupPolicyPresets, listSetupPolicyPresets, and clampSetupPolicyPresetNames. That keeps the policy filtering/clamping logic out of the onboard monolith.
Custom preset collision in clampPolicyPresetNames
Addressed. The clamp path is now source-aware: clampSetupPolicyPresetNames(...) receives the sandbox's custom preset names and preserves those names even when the same name would be unsupported as a built-in. I added coverage for both resume and non-interactive flows where a sandbox custom preset named brave must survive while the built-in brave preset is removed when web search is unsupported.

Warnings

src/lib/onboard.ts growth / policy-domain extraction
Addressed in a0426e414. The new setup preset helpers are now in src/lib/policies.ts, and src/lib/onboard.ts now calls into the policy module instead of carrying an agent-exclusion helper cluster inline.
Exclusions leaking back through reuse / resume
Addressed. Current applied presets, recorded resume presets, and explicit selected resume presets are clamped through the same support-aware helper before syncPresetSelection(...). I added a specific unsupported-only resume test so selectedPresets: ["brave"] with webSearchSupported: false removes Brave instead of falling back to tier defaults.
isOnboardTestInternals guard
Addressed. The guard now checks the exported preset-filter helper (filterSetupPolicyPresets) before tests destructure/use it.
New sh -c inference probe
Addressed. I left it as sh -c because the curl write-out, response body capture, and status classification need to execute inside the sandbox as one bounded probe; I added a comment explaining that and noting that sandboxName remains an argv value, so user input is not interpolated into the shell script.
Scope creep
I did not split the PR at this point. The branch still contains DNS route repair, policy preset capability filtering, and nemohermes uninstall routing, but each path now has focused tests and the policy work has been aligned with fix: skip Brave policy preset for unsupported agents #3223's capability-based shape. The uninstall change is intentionally small and directly covered by test/nemohermes-alias.test.ts / test/uninstall.test.ts.
src/commands/uninstall.ts literal usage string
Addressed. I added an inline comment explaining that the usage is intentionally global under the nemohermes alias because nemohermes uninstall is the package uninstaller, not a sandbox-scoped action.

Suggestions

Alternate CoreDNS namespaces
Not changed in this patch. The current PR keeps the existing kube-system/kube-dns contract and only changes the preference order from endpoint IP first to service ClusterIP first with endpoint fallback.
Document NEMOCLAW_DISABLE_INFERENCE_ROUTE_REPAIR
Addressed in 5c25526a and reflected in the PR body. The env var is now documented in docs/reference/commands.md and the generated user-reference skill mirror.
Move inference route timeout constant
Addressed in a0426e414. OPENSHELL_INFERENCE_ROUTE_PROBE_TIMEOUT_MS now lives next to the other OpenShell timeout constants in src/lib/adapters/openshell/timeouts.ts.

E2E Advisor Recommendation

I dispatched the exact recommended targeted nightly run on this PR branch:

nightly-e2e jobs: sandbox-operations-e2e,inference-routing-e2e,cloud-inference-e2e,hermes-e2e,cloud-onboard-e2e

Run: https://github.com/NVIDIA/NemoClaw/actions/runs/25592421054

Local validation after the reviewer-response patch

npm run build:cli
npm run typecheck:cli
npx vitest run test/policy-tiers-onboard.test.ts test/onboard-preset-diff.test.ts
npx vitest run test/onboard.test.ts -t "computeSetupPresetSuggestions|agentSupportsWebSearch|configureWebSearch|prints numbered step headers"
npx vitest run test/sandbox-connect-inference.test.ts src/lib/actions/dns/index.test.ts test/nemohermes-alias.test.ts test/uninstall.test.ts src/lib/actions/sandbox/oclif-command-adapters.test.ts src/lib/commands/simple-global-oclif-adapters.test.ts
git diff --check

github-actions · 2026-05-09T05:31:03Z

Selective E2E Results — ✅ All requested jobs passed

Run: 25592421054
Branch: fix/inference-local-dns-recovery
Requested jobs: sandbox-operations-e2e,inference-routing-e2e,cloud-inference-e2e,hermes-e2e,cloud-onboard-e2e
Summary: 5 passed, 0 failed, 0 skipped

Job	Result
cloud-inference-e2e	✅ success
cloud-onboard-e2e	✅ success
hermes-e2e	✅ success
inference-routing-e2e	✅ success
sandbox-operations-e2e	✅ success

## Summary Refreshes the release-prep docs for v0.0.39 based on changes merged since the Friday 4pm doc refresh. Updates the source docs, bumps the docs version metadata, and regenerates the NemoClaw user skills from the refreshed docs. ## Changes - #3314 -> `docs/get-started/prerequisites.md`, `docs/get-started/quickstart.md`, `docs/reference/troubleshooting.md`: Documents installer Docker setup, Docker group activation, and retry guidance. - #3317 -> `docs/get-started/quickstart.md`, `docs/reference/commands.md`: Documents the DGX Spark and DGX Station express install prompt and `NEMOCLAW_NO_EXPRESS`. - #3328 and #3329 -> `docs/security/best-practices.md`, `docs/deployment/sandbox-hardening.md`: Updates sandbox capability hardening docs for the stricter bounding-set and `setpriv` step-down behavior. - #3330, #3335, and #3346 -> `docs/inference/use-local-inference.md`: Documents Windows-host Ollama relaunch behavior, NIM key passthrough, early health-fail diagnostics, and mixed-GPU preflight detail. - #2406, #2883, #3001, #3244, #3267, #3318, #3320, and #3354 -> `docs/about/release-notes.md`: Adds the v0.0.39 release-prep section while keeping the v0.0.38 release notes intact. - Advances the release-prep docs metadata from v0.0.38 to v0.0.39. - Regenerates `.agents/skills/nemoclaw-user-*` from the updated source docs. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>  ## Summary by CodeRabbit ## Release Notes v0.0.39 * **New Features** * Host alias management commands for easier configuration * Sandbox GPU control options during onboarding * Update command with check and confirmation modes * **Documentation** * Enhanced Linux installer guidance with Docker and group membership handling * Expanded troubleshooting for permission and connectivity issues * Improved capability-dropping security documentation * Updated inference model switching commands * Brev environment-specific troubleshooting * **Improvements** * DGX Spark/Station express install flow * Windows Ollama relay and health-check enhancements * NVIDIA NIM preflight GPU reporting [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3375)

fix(runtime): repair stale inference DNS routes

9610b04

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Comment thread src/lib/onboard.ts Outdated

Comment thread test/onboard.test.ts Outdated

wscurran added fix labels May 8, 2026

fix(hermes): clamp policy presets during onboarding

5c91b8d

ericksoa added the v0.0.38 label May 8, 2026

fix(onboard): preserve custom policy presets

3df09bd

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Comment thread src/lib/onboard.ts Outdated

ericksoa requested a review from jyaunches May 8, 2026 17:43

jyaunches self-assigned this May 8, 2026

ericksoa added 3 commits May 8, 2026 21:26

Merge current main into inference DNS recovery

9a37393

Preserve custom policy presets when clamping agent exclusions.

docs: document inference route repair escape hatch

5c25526

ericksoa merged commit c4aaec3 into main May 9, 2026
64 checks passed

miyoungc mentioned this pull request May 12, 2026

docs: refresh 0.0.39 release prep #3375

Merged

12 tasks

wscurran added bug-fix PR fixes a bug or regression feature PR adds or expands user-visible functionality area: performance Latency, throughput, resource use, benchmarks, or scaling and removed fix feature PR adds or expands user-visible functionality labels Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(runtime): repair stale inference DNS routes#3267

fix(runtime): repair stale inference DNS routes#3267
ericksoa merged 6 commits into
mainfrom
fix/inference-local-dns-recovery

ericksoa commented May 8, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

jyaunches commented May 8, 2026

Uh oh!

jyaunches commented May 8, 2026

Uh oh!

ericksoa commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ericksoa commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jyaunches commented May 8, 2026

PR Review Notes

🔴 Blockers (must fix before merge)

🟡 Warnings (should fix)

🔵 Suggestions

✅ What's Good

Test depth recommendation: 🔴 E2E required

Uh oh!

jyaunches commented May 8, 2026

🤖 E2E Advisor Recommendation

Recommended E2E jobs to run before merge

Suggested dispatch

Coverage gaps flagged (worth adding)

Optional (lower priority)

Uh oh!

ericksoa commented May 9, 2026

PR Review Notes

Blockers

Warnings

Suggestions

E2E Advisor Recommendation

Local validation after the reviewer-response patch

Uh oh!

github-actions Bot commented May 9, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ericksoa commented May 8, 2026 •

edited

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading