Skip to content

fix(onboard): offer running vLLM in provider menu#3417

Merged
cv merged 4 commits into
mainfrom
fix/vllm-running-onboard-menu
May 12, 2026
Merged

fix(onboard): offer running vLLM in provider menu#3417
cv merged 4 commits into
mainfrom
fix/vllm-running-onboard-menu

Conversation

@ericksoa

@ericksoa ericksoa commented May 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the onboarding UX when vLLM is already running locally. The provider menu now offers detected local vLLM directly instead of telling users to restart onboarding with extra environment variables; managed vLLM install/start remains behind the experimental opt-in because it pulls images and starts containers.

Changes

  • Offer Local vLLM [experimental] (localhost:8000) — running (suggested) whenever the local /v1/models probe succeeds.
  • Keep managed Install vLLM / Start vLLM gated behind NEMOCLAW_EXPERIMENTAL=1 or an explicit provider env override.
  • Add onboarding-selection regression coverage for selecting already-running vLLM without the experimental env var.
  • Update local inference docs to distinguish already-running vLLM from managed install/start.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Additional validation run:

  • npm run build:cli
  • npx vitest run test/onboard-selection.test.ts --testNamePattern "running vLLM|forces openai-completions for vLLM"
  • npx vitest run test/onboard-selection.test.ts --testTimeout 60000
  • npm run typecheck:cli
  • npm run docs:strict
  • npm run source-shape:check
  • git diff --check

Signed-off-by: Aaron Erickson aerickson@nvidia.com

Summary by CodeRabbit

  • Documentation
    • Clarified vLLM availability rules, experimental gating, and onboarding wording for detected local vLLM servers vs. managed install/start flows.
  • Improvements
    • Onboarding now surfaces a "Local vLLM [experimental] — running (suggested)" option when a local vLLM server is detected and refines opt-in requirements for managed vLLM install/start.
  • Tests
    • Added tests for interactive UX when vLLM is running and for non-interactive behavior when a requested vLLM provider is unavailable.

Review Change Stack

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@coderabbitai

coderabbitai Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8bdcf8e3-7b44-4e6b-9198-e5359403922d

📥 Commits

Reviewing files that changed from the base of the PR and between bb2e136 and 9c4a2b4.

📒 Files selected for processing (1)
  • test/onboard-selection.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/onboard-selection.test.ts

📝 Walkthrough

Walkthrough

This PR refines vLLM inference option availability: onboarding now always shows a detected running local vLLM, while managed vLLM install/start remains gated by NEMOCLAW_EXPERIMENTAL=1 and supported host profiles. Docs, onboarding logic, and tests were updated to reflect this two-path flow.

Changes

vLLM Detection and Experimental Flag Refinement

Layer / File(s) Summary
vLLM availability rules documentation
docs/inference/inference-options.md, docs/inference/use-local-inference.md
Documentation updated to distinguish running vLLM detection (appears when a vLLM server is running on localhost:8000) from managed install/start (gated by NEMOCLAW_EXPERIMENTAL=1 and supported host profile). Non-interactive example adjusted.
Onboarding menu refactor
src/lib/onboard.ts
Menu logic now adds the running vllm option whenever vllmRunning is true; install-vllm is added only when vllmProfile exists and the user explicitly opts into install via NEMOCLAW_PROVIDER=install-vllm or NEMOCLAW_EXPERIMENTAL=1. Non-interactive fallback mapping updated.
Onboarding UX validation tests
test/onboard-selection.test.ts
New tests validate interactive detection of running local vLLM (UI text, provider/model selection) and that NEMOCLAW_PROVIDER=vllm in non-interactive mode does not trigger a managed install and does not call installVllm.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A vLLM server awaits,
I sniff the localhost gates,
If it's running, I shout "found!" with cheer,
Installs stay gated until experiment flags appear,
Tests hop in to make the logic clear.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(onboard): offer running vLLM in provider menu' accurately describes the main change: enabling users to select an already-running vLLM instance directly in the onboarding provider menu, which is the core objective of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/vllm-running-onboard-menu

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 12, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: onboard-resume-e2e, onboard-repair-e2e, double-onboard-e2e, cloud-onboard-e2e, inference-routing-e2e, gpu-double-onboard-e2e, docs-cli-parity, docs-links-pr

Dispatch hint: onboard-resume-e2e,onboard-repair-e2e,double-onboard-e2e,cloud-onboard-e2e,inference-routing-e2e

Workflow run

Full advisor summary

Pi Semantic E2E Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. The code change is a menu-surfacing and non-interactive dispatch refinement for the Local vLLM provider inside setupNim(). It does not alter credential storage, sandbox lifecycle, network policy, installer, inference.local routing, or the default cloud provider path, and the hermetic test/onboard-selection.test.ts additions directly assert the new contract. The PR-time workflows (pr.yaml, pr-self-hosted.yaml, docs-cli-parity-pr.yaml, docs-links-pr.yaml) already auto-run the relevant checks (unit tests, CLI/docs parity, markdown links), and the richer nightly onboard/inference E2Es are useful confidence but not merge-blocking for this scoped wizard change. No CI-wired E2E exercises the vLLM branch today, so extra PR-time E2E would be inventing coverage the repo does not yet have.

Optional E2E

  • onboard-resume-e2e (medium): Exercises the non-interactive onboard path where NEMOCLAW_PROVIDER resolution now fails loud for vllm instead of silently falling through to install-vllm. Nightly job covers re-entry/resume of the wizard.
  • onboard-repair-e2e (medium): Verifies the recorded-provider/recovery branch in setupNim() still reaches correct menu keys after the provider list reshuffle (vllm key is now conditional on vllmRunning; install-vllm on profile + flag).
  • double-onboard-e2e (medium): Covers re-onboarding a sandbox; confirms the reshaped provider menu still round-trips a prior selection without accidental provider-switch.
  • cloud-onboard-e2e (medium): Smoke for the default (non-vLLM) onboard path to ensure the reordering of option insertion in setupNim() did not disturb NVIDIA Endpoints / cloud provider selection.
  • inference-routing-e2e (medium): Guards against regressions in how the wizard classifies provider keys and records the provider name; inference routing tests verify credential isolation and error classification after onboard.
  • gpu-double-onboard-e2e (high): Local-inference onboarding on a GPU runner. Not vLLM-specific, but the nearest CI job that repeatedly walks the local-inference branch of setupNim() (Ollama side, parallel to the vLLM branch that changed).
  • docs-cli-parity (low): docs-cli-parity-pr.yaml already auto-triggers on src/** and checks install.sh provider parity against docs. The inference-options.md / use-local-inference.md rewrites touch the same provider taxonomy referenced by check-docs.sh.
  • docs-links-pr (low): docs-links-pr.yaml auto-triggers on **/*.md; the two edited inference docs cross-link to switch-inference-providers.md and use-local-inference.md and should be link-validated.

New E2E recommendations

  • onboarding-wizard (medium): No existing CI workflow covers the Local vLLM branch of the onboard wizard end-to-end. test/e2e/test-spark-install.sh exists in the tree but is not wired to any workflow under .github/workflows, so the new 'running vLLM surfaces without NEMOCLAW_EXPERIMENTAL' contract is only covered by the hermetic unit test in test/onboard-selection.test.ts.
    • Suggested test: Add a hermetic E2E (e.g. test/e2e/test-vllm-onboard-selection.sh) that stubs an OpenAI-compatible /v1/models server on localhost:8000, runs nemoclaw onboard --non-interactive with NEMOCLAW_PROVIDER=vllm (no NEMOCLAW_EXPERIMENTAL), asserts the sandbox records provider=vllm-local routed through inference.local, and separately asserts that NEMOCLAW_PROVIDER=vllm with no server running fails with 'Requested provider vllm is not available' instead of silently falling back to install-vllm. Wire it into nightly-e2e.yaml.
  • onboarding-wizard (low): The managed install/start vLLM entry is still gated by EXPERIMENTAL plus a supported vllmProfile; no CI job asserts that the gate still holds (no regression letting install-vllm appear without opt-in on unsupported hosts).
    • Suggested test: Add a static/unit or hermetic test that mocks getVllmProfile()/vllmRunning and asserts install-vllm only appears when (userChoseManagedVllm || EXPERIMENTAL) and vllmProfile is truthy, and that non-interactive NEMOCLAW_PROVIDER=install-vllm without EXPERIMENTAL still requires a vllmProfile match.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: onboard-resume-e2e,onboard-repair-e2e,double-onboard-e2e,cloud-onboard-e2e,inference-routing-e2e

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
docs/inference/use-local-inference.md (1)

238-245: ⚡ Quick win

Format CLI command mentions as inline code in prose.

Line 238 and Line 245 reference the onboarding command in plain text (onboard / rerun onboard). Please format command references as inline code for consistency.

Suggested edit
-For an already-running vLLM server, run onboard and select **Local vLLM [experimental]** from the provider list.
+For an already-running vLLM server, run `nemoclaw onboard` and select **Local vLLM [experimental]** from the provider list.
...
-If vLLM is not running and your host matches a managed profile, set `NEMOCLAW_EXPERIMENTAL=1`, rerun onboard, and select the **Install vLLM** or **Start vLLM** entry.
+If vLLM is not running and your host matches a managed profile, set `NEMOCLAW_EXPERIMENTAL=1`, rerun `nemoclaw onboard`, and select the **Install vLLM** or **Start vLLM** entry.

As per coding guidelines: "CLI commands, file paths, flags, parameter names, and values must use inline code formatting."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/inference/use-local-inference.md` around lines 238 - 245, The prose
references CLI commands and an env var without inline code formatting; update
the sentences that mention the onboarding command and the rerun instruction so
`nemoclaw onboard` and `rerun onboard` are formatted as inline code, and format
the environment variable `NEMOCLAW_EXPERIMENTAL=1` as inline code as well;
locate the occurrences around the vLLM onboarding text (mentions of "Local vLLM
[experimental]", "nemoclaw onboard", "rerun onboard", and
"NEMOCLAW_EXPERIMENTAL=1") and replace the plain text command/env references
with inline code snippets to match the project's CLI/flag formatting guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/inference/use-local-inference.md`:
- Around line 238-245: The prose references CLI commands and an env var without
inline code formatting; update the sentences that mention the onboarding command
and the rerun instruction so `nemoclaw onboard` and `rerun onboard` are
formatted as inline code, and format the environment variable
`NEMOCLAW_EXPERIMENTAL=1` as inline code as well; locate the occurrences around
the vLLM onboarding text (mentions of "Local vLLM [experimental]", "nemoclaw
onboard", "rerun onboard", and "NEMOCLAW_EXPERIMENTAL=1") and replace the plain
text command/env references with inline code snippets to match the project's
CLI/flag formatting guidelines.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5f1a8f21-1fdb-4c9c-8690-aff17ea4c3b6

📥 Commits

Reviewing files that changed from the base of the PR and between 1eaa16e and cc47fb8.

📒 Files selected for processing (4)
  • docs/inference/inference-options.md
  • docs/inference/use-local-inference.md
  • src/lib/onboard.ts
  • test/onboard-selection.test.ts

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/onboard-selection.test.ts (1)

236-242: ⚡ Quick win

Avoid hard-coding provider menu index in this new vLLM-running test.

Using "7" makes this test fragile if provider ordering changes. You’re already asserting menu output with a dynamic ^\s*\d+\) regex, so the selection should be dynamic too.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/onboard-selection.test.ts` around lines 236 - 242, The test hard-codes
the provider selection as "7" via answers and credentials.prompt, which is
fragile; instead read the provider menu text already captured in messages (from
credentials.prompt), find the line matching /^\s*\d+\)/ to extract the shown
index for the desired provider, and push that extracted index (as a string) into
answers so the prompt selection is derived from the test output; update
references to answers/credentials.prompt in the test (e.g., the existing answers
array and credentials.prompt mock) to use the dynamically extracted index before
returning from the mock.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/onboard-selection.test.ts`:
- Around line 314-339: The test script must explicitly stub the interactive
credential helpers so the non-interactive vLLM test fails fast if a prompt is
attempted: require the credentials module used by setupNim (e.g. const
credentials = require(${credentialsPath});) and set credentials.prompt = () => {
throw new Error("Unexpected prompt in non-interactive test"); } and
credentials.ensureApiKey = async () => { throw new Error("Unexpected
ensureApiKey call in non-interactive test"); }; this ensures any call to
credentials.prompt or credentials.ensureApiKey during setupNim will immediately
throw instead of hanging.

---

Nitpick comments:
In `@test/onboard-selection.test.ts`:
- Around line 236-242: The test hard-codes the provider selection as "7" via
answers and credentials.prompt, which is fragile; instead read the provider menu
text already captured in messages (from credentials.prompt), find the line
matching /^\s*\d+\)/ to extract the shown index for the desired provider, and
push that extracted index (as a string) into answers so the prompt selection is
derived from the test output; update references to answers/credentials.prompt in
the test (e.g., the existing answers array and credentials.prompt mock) to use
the dynamically extracted index before returning from the mock.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1c72d588-b783-43c6-b8fb-a174418ba551

📥 Commits

Reviewing files that changed from the base of the PR and between 7dcebda and bb2e136.

📒 Files selected for processing (2)
  • src/lib/onboard.ts
  • test/onboard-selection.test.ts

Comment thread test/onboard-selection.test.ts
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@ericksoa ericksoa requested a review from cv May 12, 2026 21:20
@cv cv merged commit bf73bf0 into main May 12, 2026
31 checks passed
@miyoungc miyoungc mentioned this pull request May 12, 2026
4 tasks
ericksoa pushed a commit that referenced this pull request May 13, 2026
## Summary
- Add v0.0.40 release notes and update docs version metadata.
- Document release-prep behavior changes around onboarding, local
inference, policy preset filtering, and config recovery.
- Refresh generated `nemoclaw-user-*` skills from the source docs.

## Source summary
- #3383 -> `docs/about/release-notes.md`, `docs/reference/commands.md`,
`docs/manage-sandboxes/lifecycle.md`: Reflect macOS Docker-driver
OpenShell gateway onboarding and upgrade behavior.
- #3378 -> `docs/about/release-notes.md`: Capture the Docker-driver
gateway TCP readiness fix and clearer startup failures.
- #3338 -> `docs/about/release-notes.md`,
`docs/inference/use-local-inference.md`: Reflect the Ollama auth proxy
token requirement on native API routes.
- #3420 -> `docs/about/release-notes.md`,
`docs/get-started/prerequisites.md`,
`docs/inference/use-local-inference.md`: Document the Linux Ollama
`zstd` preflight and sudo messaging.
- #3417 -> `docs/about/release-notes.md`,
`docs/inference/inference-options.md`,
`docs/inference/use-local-inference.md`: Reflect detected running vLLM
provider selection.
- #3223 -> `docs/about/release-notes.md`, `docs/reference/commands.md`,
`docs/reference/network-policies.md`, `docs/get-started/quickstart.md`:
Document agent-aware policy preset filtering.
- #3385 -> `docs/about/release-notes.md`: Capture the dashboard forward
TCP reachability check.
- #3160 -> `docs/about/release-notes.md`,
`docs/reference/troubleshooting.md`: Document empty `openclaw.json`
baseline recovery.
- #3367 -> `docs/about/release-notes.md`: Capture OpenClaw plugin
compatibility metadata.

## Test plan
- [x] `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user`
- [x] `make docs`
- [x] `git diff --check`
- [x] Skip-term scan for `docs/.docs-skip` terms

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

# Release Notes v0.0.40

* **New Features**
* Sandbox configuration recovery when inference changes cause data loss
  * Policy presets now intelligently filter based on agent capabilities
  * Enhanced gateway health checks and upgrade reliability

* **Documentation**
* Improved local inference setup instructions with clearer dependency
requirements
  * Clarified vLLM experimental feature availability and prerequisites
  * Reorganized architecture documentation for enhanced clarity
  * Refined security and hardening guidance

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3427)

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@wscurran wscurran added area: cli Command line interface, flags, terminal UX, or output area: inference Inference routing, serving, model selection, or outputs area: install Install, setup, prerequisites, or uninstall flow area: local-models Local model providers, downloads, launch, or connectivity area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow area: providers Inference provider integrations and provider behavior bug-fix PR fixes a bug or regression labels Jun 3, 2026
@wscurran wscurran added feature PR adds or expands user-visible functionality and removed Getting Started feature PR adds or expands user-visible functionality labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: cli Command line interface, flags, terminal UX, or output area: inference Inference routing, serving, model selection, or outputs area: install Install, setup, prerequisites, or uninstall flow area: local-models Local model providers, downloads, launch, or connectivity area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow area: providers Inference provider integrations and provider behavior bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants