fix(onboard): offer running vLLM in provider menu by ericksoa · Pull Request #3417 · NVIDIA/NemoClaw

ericksoa · 2026-05-12T17:23:31Z

Summary

Fixes the onboarding UX when vLLM is already running locally. The provider menu now offers detected local vLLM directly instead of telling users to restart onboarding with extra environment variables; managed vLLM install/start remains behind the experimental opt-in because it pulls images and starts containers.

Changes

Offer Local vLLM [experimental] (localhost:8000) — running (suggested) whenever the local /v1/models probe succeeds.
Keep managed Install vLLM / Start vLLM gated behind NEMOCLAW_EXPERIMENTAL=1 or an explicit provider env override.
Add onboarding-selection regression coverage for selecting already-running vLLM without the experimental env var.
Update local inference docs to distinguish already-running vLLM from managed install/start.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Additional validation run:

npm run build:cli
npx vitest run test/onboard-selection.test.ts --testNamePattern "running vLLM|forces openai-completions for vLLM"
npx vitest run test/onboard-selection.test.ts --testTimeout 60000
npm run typecheck:cli
npm run docs:strict
npm run source-shape:check
git diff --check

Signed-off-by: Aaron Erickson aerickson@nvidia.com

Summary by CodeRabbit

Documentation
- Clarified vLLM availability rules, experimental gating, and onboarding wording for detected local vLLM servers vs. managed install/start flows.
Improvements
- Onboarding now surfaces a "Local vLLM [experimental] — running (suggested)" option when a local vLLM server is detected and refines opt-in requirements for managed vLLM install/start.
Tests
- Added tests for interactive UX when vLLM is running and for non-interactive behavior when a requested vLLM provider is unavailable.

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

coderabbitai · 2026-05-12T17:23:44Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8bdcf8e3-7b44-4e6b-9198-e5359403922d

📥 Commits

Reviewing files that changed from the base of the PR and between bb2e136 and 9c4a2b4.

📒 Files selected for processing (1)

test/onboard-selection.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

test/onboard-selection.test.ts

📝 Walkthrough

Walkthrough

This PR refines vLLM inference option availability: onboarding now always shows a detected running local vLLM, while managed vLLM install/start remains gated by NEMOCLAW_EXPERIMENTAL=1 and supported host profiles. Docs, onboarding logic, and tests were updated to reflect this two-path flow.

Changes

vLLM Detection and Experimental Flag Refinement

Layer / File(s)	Summary
vLLM availability rules documentation `docs/inference/inference-options.md`, `docs/inference/use-local-inference.md`	Documentation updated to distinguish running vLLM detection (appears when a vLLM server is running on `localhost:8000`) from managed install/start (gated by `NEMOCLAW_EXPERIMENTAL=1` and supported host profile). Non-interactive example adjusted.
Onboarding menu refactor `src/lib/onboard.ts`	Menu logic now adds the running `vllm` option whenever `vllmRunning` is true; `install-vllm` is added only when `vllmProfile` exists and the user explicitly opts into install via `NEMOCLAW_PROVIDER=install-vllm` or `NEMOCLAW_EXPERIMENTAL=1`. Non-interactive fallback mapping updated.
Onboarding UX validation tests `test/onboard-selection.test.ts`	New tests validate interactive detection of running local vLLM (UI text, provider/model selection) and that `NEMOCLAW_PROVIDER=vllm` in non-interactive mode does not trigger a managed install and does not call `installVllm`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A vLLM server awaits,
I sniff the localhost gates,
If it's running, I shout "found!" with cheer,
Installs stay gated until experiment flags appear,
Tests hop in to make the logic clear.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(onboard): offer running vLLM in provider menu' accurately describes the main change: enabling users to select an already-running vLLM instance directly in the onboarding provider menu, which is the core objective of this PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/vllm-running-onboard-menu

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-12T17:25:35Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: onboard-resume-e2e, onboard-repair-e2e, double-onboard-e2e, cloud-onboard-e2e, inference-routing-e2e, gpu-double-onboard-e2e, docs-cli-parity, docs-links-pr

Dispatch hint: onboard-resume-e2e,onboard-repair-e2e,double-onboard-e2e,cloud-onboard-e2e,inference-routing-e2e

Workflow run

Full advisor summary

Pi Semantic E2E Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

None. The code change is a menu-surfacing and non-interactive dispatch refinement for the Local vLLM provider inside setupNim(). It does not alter credential storage, sandbox lifecycle, network policy, installer, inference.local routing, or the default cloud provider path, and the hermetic test/onboard-selection.test.ts additions directly assert the new contract. The PR-time workflows (pr.yaml, pr-self-hosted.yaml, docs-cli-parity-pr.yaml, docs-links-pr.yaml) already auto-run the relevant checks (unit tests, CLI/docs parity, markdown links), and the richer nightly onboard/inference E2Es are useful confidence but not merge-blocking for this scoped wizard change. No CI-wired E2E exercises the vLLM branch today, so extra PR-time E2E would be inventing coverage the repo does not yet have.

Optional E2E

onboard-resume-e2e (medium): Exercises the non-interactive onboard path where NEMOCLAW_PROVIDER resolution now fails loud for vllm instead of silently falling through to install-vllm. Nightly job covers re-entry/resume of the wizard.
onboard-repair-e2e (medium): Verifies the recorded-provider/recovery branch in setupNim() still reaches correct menu keys after the provider list reshuffle (vllm key is now conditional on vllmRunning; install-vllm on profile + flag).
double-onboard-e2e (medium): Covers re-onboarding a sandbox; confirms the reshaped provider menu still round-trips a prior selection without accidental provider-switch.
cloud-onboard-e2e (medium): Smoke for the default (non-vLLM) onboard path to ensure the reordering of option insertion in setupNim() did not disturb NVIDIA Endpoints / cloud provider selection.
inference-routing-e2e (medium): Guards against regressions in how the wizard classifies provider keys and records the provider name; inference routing tests verify credential isolation and error classification after onboard.
gpu-double-onboard-e2e (high): Local-inference onboarding on a GPU runner. Not vLLM-specific, but the nearest CI job that repeatedly walks the local-inference branch of setupNim() (Ollama side, parallel to the vLLM branch that changed).
docs-cli-parity (low): docs-cli-parity-pr.yaml already auto-triggers on src/** and checks install.sh provider parity against docs. The inference-options.md / use-local-inference.md rewrites touch the same provider taxonomy referenced by check-docs.sh.
docs-links-pr (low): docs-links-pr.yaml auto-triggers on **/*.md; the two edited inference docs cross-link to switch-inference-providers.md and use-local-inference.md and should be link-validated.

New E2E recommendations

onboarding-wizard (medium): No existing CI workflow covers the Local vLLM branch of the onboard wizard end-to-end. test/e2e/test-spark-install.sh exists in the tree but is not wired to any workflow under .github/workflows, so the new 'running vLLM surfaces without NEMOCLAW_EXPERIMENTAL' contract is only covered by the hermetic unit test in test/onboard-selection.test.ts.
- Suggested test: Add a hermetic E2E (e.g. test/e2e/test-vllm-onboard-selection.sh) that stubs an OpenAI-compatible /v1/models server on localhost:8000, runs nemoclaw onboard --non-interactive with NEMOCLAW_PROVIDER=vllm (no NEMOCLAW_EXPERIMENTAL), asserts the sandbox records provider=vllm-local routed through inference.local, and separately asserts that NEMOCLAW_PROVIDER=vllm with no server running fails with 'Requested provider vllm is not available' instead of silently falling back to install-vllm. Wire it into nightly-e2e.yaml.
onboarding-wizard (low): The managed install/start vLLM entry is still gated by EXPERIMENTAL plus a supported vllmProfile; no CI job asserts that the gate still holds (no regression letting install-vllm appear without opt-in on unsupported hosts).
- Suggested test: Add a static/unit or hermetic test that mocks getVllmProfile()/vllmRunning and asserts install-vllm only appears when (userChoseManagedVllm || EXPERIMENTAL) and vllmProfile is truthy, and that non-interactive NEMOCLAW_PROVIDER=install-vllm without EXPERIMENTAL still requires a vllmProfile match.

Dispatch hint

Workflow: nightly-e2e.yaml
jobs input: onboard-resume-e2e,onboard-repair-e2e,double-onboard-e2e,cloud-onboard-e2e,inference-routing-e2e

coderabbitai

🧹 Nitpick comments (1)

docs/inference/use-local-inference.md (1)

238-245: ⚡ Quick win

Format CLI command mentions as inline code in prose.

Line 238 and Line 245 reference the onboarding command in plain text (onboard / rerun onboard). Please format command references as inline code for consistency.

Suggested edit

-For an already-running vLLM server, run onboard and select **Local vLLM [experimental]** from the provider list.
+For an already-running vLLM server, run `nemoclaw onboard` and select **Local vLLM [experimental]** from the provider list.
...
-If vLLM is not running and your host matches a managed profile, set `NEMOCLAW_EXPERIMENTAL=1`, rerun onboard, and select the **Install vLLM** or **Start vLLM** entry.
+If vLLM is not running and your host matches a managed profile, set `NEMOCLAW_EXPERIMENTAL=1`, rerun `nemoclaw onboard`, and select the **Install vLLM** or **Start vLLM** entry.

As per coding guidelines: "CLI commands, file paths, flags, parameter names, and values must use inline code formatting."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/inference/use-local-inference.md` around lines 238 - 245, The prose
references CLI commands and an env var without inline code formatting; update
the sentences that mention the onboarding command and the rerun instruction so
`nemoclaw onboard` and `rerun onboard` are formatted as inline code, and format
the environment variable `NEMOCLAW_EXPERIMENTAL=1` as inline code as well;
locate the occurrences around the vLLM onboarding text (mentions of "Local vLLM
[experimental]", "nemoclaw onboard", "rerun onboard", and
"NEMOCLAW_EXPERIMENTAL=1") and replace the plain text command/env references
with inline code snippets to match the project's CLI/flag formatting guidelines.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/inference/use-local-inference.md`:
- Around line 238-245: The prose references CLI commands and an env var without
inline code formatting; update the sentences that mention the onboarding command
and the rerun instruction so `nemoclaw onboard` and `rerun onboard` are
formatted as inline code, and format the environment variable
`NEMOCLAW_EXPERIMENTAL=1` as inline code as well; locate the occurrences around
the vLLM onboarding text (mentions of "Local vLLM [experimental]", "nemoclaw
onboard", "rerun onboard", and "NEMOCLAW_EXPERIMENTAL=1") and replace the plain
text command/env references with inline code snippets to match the project's
CLI/flag formatting guidelines.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5f1a8f21-1fdb-4c9c-8690-aff17ea4c3b6

📥 Commits

Reviewing files that changed from the base of the PR and between 1eaa16e and cc47fb8.

📒 Files selected for processing (4)

docs/inference/inference-options.md
docs/inference/use-local-inference.md
src/lib/onboard.ts
test/onboard-selection.test.ts

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

test/onboard-selection.test.ts (1)

236-242: ⚡ Quick win

Avoid hard-coding provider menu index in this new vLLM-running test.

Using "7" makes this test fragile if provider ordering changes. You’re already asserting menu output with a dynamic ^\s*\d+\) regex, so the selection should be dynamic too.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/onboard-selection.test.ts` around lines 236 - 242, The test hard-codes
the provider selection as "7" via answers and credentials.prompt, which is
fragile; instead read the provider menu text already captured in messages (from
credentials.prompt), find the line matching /^\s*\d+\)/ to extract the shown
index for the desired provider, and push that extracted index (as a string) into
answers so the prompt selection is derived from the test output; update
references to answers/credentials.prompt in the test (e.g., the existing answers
array and credentials.prompt mock) to use the dynamically extracted index before
returning from the mock.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/onboard-selection.test.ts`:
- Around line 314-339: The test script must explicitly stub the interactive
credential helpers so the non-interactive vLLM test fails fast if a prompt is
attempted: require the credentials module used by setupNim (e.g. const
credentials = require(${credentialsPath});) and set credentials.prompt = () => {
throw new Error("Unexpected prompt in non-interactive test"); } and
credentials.ensureApiKey = async () => { throw new Error("Unexpected
ensureApiKey call in non-interactive test"); }; this ensures any call to
credentials.prompt or credentials.ensureApiKey during setupNim will immediately
throw instead of hanging.

---

Nitpick comments:
In `@test/onboard-selection.test.ts`:
- Around line 236-242: The test hard-codes the provider selection as "7" via
answers and credentials.prompt, which is fragile; instead read the provider menu
text already captured in messages (from credentials.prompt), find the line
matching /^\s*\d+\)/ to extract the shown index for the desired provider, and
push that extracted index (as a string) into answers so the prompt selection is
derived from the test output; update references to answers/credentials.prompt in
the test (e.g., the existing answers array and credentials.prompt mock) to use
the dynamically extracted index before returning from the mock.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1c72d588-b783-43c6-b8fb-a174418ba551

📥 Commits

Reviewing files that changed from the base of the PR and between 7dcebda and bb2e136.

📒 Files selected for processing (2)

src/lib/onboard.ts
test/onboard-selection.test.ts

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

## Summary - Add v0.0.40 release notes and update docs version metadata. - Document release-prep behavior changes around onboarding, local inference, policy preset filtering, and config recovery. - Refresh generated `nemoclaw-user-*` skills from the source docs. ## Source summary - #3383 -> `docs/about/release-notes.md`, `docs/reference/commands.md`, `docs/manage-sandboxes/lifecycle.md`: Reflect macOS Docker-driver OpenShell gateway onboarding and upgrade behavior. - #3378 -> `docs/about/release-notes.md`: Capture the Docker-driver gateway TCP readiness fix and clearer startup failures. - #3338 -> `docs/about/release-notes.md`, `docs/inference/use-local-inference.md`: Reflect the Ollama auth proxy token requirement on native API routes. - #3420 -> `docs/about/release-notes.md`, `docs/get-started/prerequisites.md`, `docs/inference/use-local-inference.md`: Document the Linux Ollama `zstd` preflight and sudo messaging. - #3417 -> `docs/about/release-notes.md`, `docs/inference/inference-options.md`, `docs/inference/use-local-inference.md`: Reflect detected running vLLM provider selection. - #3223 -> `docs/about/release-notes.md`, `docs/reference/commands.md`, `docs/reference/network-policies.md`, `docs/get-started/quickstart.md`: Document agent-aware policy preset filtering. - #3385 -> `docs/about/release-notes.md`: Capture the dashboard forward TCP reachability check. - #3160 -> `docs/about/release-notes.md`, `docs/reference/troubleshooting.md`: Document empty `openclaw.json` baseline recovery. - #3367 -> `docs/about/release-notes.md`: Capture OpenClaw plugin compatibility metadata. ## Test plan - [x] `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user` - [x] `make docs` - [x] `git diff --check` - [x] Skip-term scan for `docs/.docs-skip` terms  ## Summary by CodeRabbit # Release Notes v0.0.40 * **New Features** * Sandbox configuration recovery when inference changes cause data loss * Policy presets now intelligently filter based on agent capabilities * Enhanced gateway health checks and upgrade reliability * **Documentation** * Improved local inference setup instructions with clearer dependency requirements * Clarified vLLM experimental feature availability and prerequisites * Reorganized architecture documentation for enhanced clarity * Refined security and hardening guidance [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3427)

fix(onboard): offer running vLLM in provider menu

cc47fb8

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

docs(inference): format vLLM onboard command references

7dcebda

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

ericksoa added fix labels May 12, 2026

ericksoa self-assigned this May 12, 2026

ericksoa added Local Models labels May 12, 2026

fix(onboard): keep vLLM install opt-in explicit

bb2e136

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Comment thread test/onboard-selection.test.ts

test(onboard): harden vLLM selection checks

9c4a2b4

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

ericksoa requested a review from cv May 12, 2026 21:20

cv approved these changes May 12, 2026

View reviewed changes

cv merged commit bf73bf0 into main May 12, 2026
31 checks passed

miyoungc mentioned this pull request May 12, 2026

docs: prepare v0.0.40 release docs #3427

Merged

4 tasks

This was referenced May 16, 2026

feat(onboard): wizard honours NEMOCLAW_PROVIDER outside non-interactive mode #3643

Merged

fix(onboard): surface install-vllm when explicitly opted in via env #3790

Merged

feat(onboard): show managed vLLM by default on DGX Spark and Station #3921

Merged

wscurran added feature PR adds or expands user-visible functionality and removed Getting Started feature PR adds or expands user-visible functionality labels Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(onboard): offer running vLLM in provider menu#3417

fix(onboard): offer running vLLM in provider menu#3417
cv merged 4 commits into
mainfrom
fix/vllm-running-onboard-menu

ericksoa commented May 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 12, 2026 •

edited

Loading

Pi Semantic E2E Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ericksoa commented May 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

Pi Semantic E2E Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ericksoa commented May 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 12, 2026 •

edited

Loading

github-actions Bot commented May 12, 2026 •

edited

Loading