fix(onboard): restore Qwen3.6 27B FP8 as DGX Station vLLM default by zyang-dev · Pull Request #4888 · NVIDIA/NemoClaw

zyang-dev · 2026-06-06T03:40:46Z

Summary

Restores Qwen 3.6 27B FP8 as the default managed-vLLM model for DGX Station because the DeepSeek V4 Flash recipe needs more accuracy validation.

Changes

Switched the DGX Station vLLM profile default back to Qwen/Qwen3.6-27B-FP8.
Kept deepseek-v4-flash registered as a supported managed-vLLM override.
Updated profile tests and docs to describe Qwen 27B FP8 as the DGX Station default.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Your Name your-email@example.com

Summary by CodeRabbit

Documentation
- Updated the default model for DGX Station's managed vLLM profile from DeepSeek V4 Flash to Qwen3.6-27B-FP8.
- Reordered documentation tables listing available model overrides.
- Updated environment variable documentation for model configuration.
Tests
- Updated test cases to verify the new default model behavior for DGX Station vLLM profiles.

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

coderabbitai · 2026-06-06T03:40:56Z

Too many files changed? Review this PR in Change Stack to see how the pieces fit before you dive in.

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cbe899ec-e049-4a30-a220-a879eb7e45ad

📥 Commits

Reviewing files that changed from the base of the PR and between 1f0b4c5 and a11e7f4.

📒 Files selected for processing (8)

docs/inference/inference-options.mdx
docs/reference/commands-nemohermes.mdx
docs/reference/commands.mdx
src/lib/inference/vllm-models.test.ts
src/lib/inference/vllm-models.ts
src/lib/inference/vllm.test.ts
src/lib/inference/vllm.ts
test/detect-vllm-profile.test.ts

📝 Walkthrough

Walkthrough

This PR updates the default vLLM model for DGX Station from DeepSeek V4 Flash to Qwen3.6-27B-FP8. The change spans platform profile configuration, test assertions, documentation tables, and environment variable references to ensure consistent behavior across the codebase.

Changes

DGX Station vLLM Default Model Switch

Layer / File(s)	Summary
Core model selection logic `src/lib/inference/vllm.ts`	Introduces `qwen27bFP8Model()` helper and updates `STATION_PROFILE.defaultModel` to use the new model. The `installVllm` comment documenting per-platform defaults is adjusted to reflect Station: Qwen3.6-27B and Spark: Qwen3.6-35B-A3B NVFP4.
Model selector documentation `src/lib/inference/vllm-models.ts`	`selectVllmModelFromEnv` documentation is updated to specify per-platform defaults for unset `NEMOCLAW_VLLM_MODEL` (Station: Qwen3.6-27B; Spark: Qwen3.6-35B-A3B NVFP4).
Test expectations for new model default `src/lib/inference/vllm.test.ts`, `src/lib/inference/vllm-models.test.ts`, `test/detect-vllm-profile.test.ts`	vLLM profile detection tests assert `Qwen/Qwen3.6-27B-FP8` with env value `qwen3.6-27b` for DGX Station. Test descriptions are clarified to distinguish DeepSeek as a managed-vLLM override rather than a Station-specific default.
User-facing documentation `docs/inference/inference-options.mdx`, `docs/reference/commands-nemohermes.mdx`, `docs/reference/commands.mdx`	The inference options table promotes `qwen3.6-27b` to "default on the DGX Station profile" and demotes `deepseek-v4-flash` to "supported override." Environment variable reference docs reorder recognized slugs to reflect the new ordering.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

NVIDIA/NemoClaw#4867: Prior vLLM model default changes for DGX Station and related platform profile wiring updates.

Suggested labels

Platform: Station, Provider: vLLM, area: inference, bug-fix

Suggested reviewers

cv

Poem

🐰 A Station that once held a DeepSeek so bright,
Now shines with a Qwen's efficient light,
Tests and docs aligned in perfect accord,
Models swapped swift by reviewer's word! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main change: reverting DGX Station's default vLLM model from DeepSeek V4 Flash back to Qwen3.6 27B FP8, which is reflected consistently across documentation and test updates.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/revert-dgx-station-vllm-to-qwen27b

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-06T03:41:17Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt

github-actions · 2026-06-06T03:41:18Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt

github-actions · 2026-06-06T03:41:27Z

🌿 Preview your docs: https://nvidia-preview-pr-4888.docs.buildwithfern.com/nemoclaw

github-actions · 2026-06-06T03:43:17Z

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Top item: No actionable findings

Consider writing more tests for

**Runtime validation** — DGX Station managed-vLLM install with `NEMOCLAW_VLLM_MODEL` unset pre-downloads/serves `Qwen/Qwen3.6-27B-FP8` and reports that model in the install summary.. Unit tests cover the static profile/default selection and DeepSeek override registry behavior. Because the touched path starts containers and serves a model in managed-vLLM onboarding, runtime validation would improve confidence without being required to understand the code-level change.
**Runtime validation** — DGX Station managed-vLLM install with `NEMOCLAW_VLLM_MODEL=deepseek-v4-flash` still selects DeepSeek as an override and keeps the DeepSeek-specific serve flags.. Unit tests cover the static profile/default selection and DeepSeek override registry behavior. Because the touched path starts containers and serves a model in managed-vLLM onboarding, runtime validation would improve confidence without being required to understand the code-level change.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

## Summary - Add the v0.0.61 release notes from the GitHub dev announcement. - Document managed vLLM recovery after host reboot and Slack denied-mention feedback. - Refresh generated `nemoclaw-user-*` skills from the source docs. ## Source summary - #4983 -> `docs/about/release-notes.mdx`: Added the v0.0.61 release summary from the dev announcement and linked behavior groups to deeper docs. - #4904 -> `docs/inference/use-local-inference.mdx`: Documented that managed vLLM restarts the `nemoclaw-vllm` container after host reboot during recovery. - #4933 -> `docs/manage-sandboxes/messaging-channels.mdx`: Documented Slack sender feedback for denied channel `@mention` events. - #4879, #4915, #4935, #4759, #4164, #4888, #4897, #4944, #4959 -> `.agents/skills/`: Refreshed generated user skills from the current source docs for release prep. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` (passed outside the tool sandbox after `tsx` IPC pipe creation was blocked in the sandbox) - `npm run build:cli` (refreshed local `dist/` for the pre-push TypeScript hook) - Commit and pre-push hooks passed, including docs-to-skills verification, markdownlint, gitleaks, skills YAML tests, and CLI TypeScript.  ## Summary by CodeRabbit * **Documentation** * Updated sandbox security documentation with file descriptor limits. * Changed default inference model for DGX Station profile. * Enhanced agent policy and backup/restore documentation. * Improved command reference examples with clearer formatting. * Clarified Slack messaging denial notice behavior. * Added automatic vLLM container recovery during host reboot. * Updated release notes for v0.0.61.

fix(onboard): restore Qwen3.6 27B FP8 as DGX Station vLLM default

a11e7f4

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

zyang-dev added the v0.0.61 Release target label Jun 6, 2026

cv approved these changes Jun 6, 2026

View reviewed changes

cv merged commit af39e2a into main Jun 6, 2026
36 checks passed

cv deleted the fix/revert-dgx-station-vllm-to-qwen27b branch June 6, 2026 06:39

miyoungc mentioned this pull request Jun 8, 2026

docs: refresh v0.0.61 release docs #4992

Merged

wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026

coderabbitai Bot mentioned this pull request Jun 9, 2026

feat(inference): add interactive managed-vLLM model picker #5038

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(onboard): restore Qwen3.6 27B FP8 as DGX Station vLLM default#4888

fix(onboard): restore Qwen3.6 27B FP8 as DGX Station vLLM default#4888
cv merged 1 commit into
mainfrom
fix/revert-dgx-station-vllm-to-qwen27b

zyang-dev commented Jun 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 6, 2026

E2E Recommendation Advisor

Uh oh!

github-actions Bot commented Jun 6, 2026

E2E Scenario Advisor

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zyang-dev commented Jun 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 6, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Uh oh!

github-actions Bot commented Jun 6, 2026

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

github-actions Bot commented Jun 6, 2026

PR Review Advisor

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zyang-dev commented Jun 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading