feat(inference): switches the DGX Spark managed-vLLM profile to Qwen3.6-35B-A3B-NVFP4 by zyang-dev · Pull Request #4619 · NVIDIA/NemoClaw

zyang-dev · 2026-06-01T20:43:51Z

Summary

Switches the DGX Spark managed-vLLM profile to NVIDIA's Qwen3.6-35B-A3B-NVFP4 checkpoint on the upstream vllm/vllm-openai nightly image, with the FlashInfer/MoE env vars and serve flags that checkpoint needs.

Changes

Add nvidia/Qwen3.6-35B-A3B-NVFP4 (qwen3.6-35b-a3b-nvfp4) to the vLLM model registry with its modelopt/NVFP4/MoE/speculative + tool-call/reasoning flags and --load-format fastsafetensors.
Extend VllmModelDef with serveEnv; buildVllmServeCommand now prepends export K=V && … for those vars (FlashInfer MoE/FP8 backend, version-check skip, CUTE_DSL_ARCH=sm_121a).
Point the DGX Spark profile at UPSTREAM_VLLM_IMAGE (vllm/vllm-openai:nightly-1fc2cee50…, vllm 0.21.1rc1.dev323) + the new model; pin DGX Station and generic-Linux to NGC_VLLM_IMAGE (nvcr.io/nvidia/vllm:26.03.post1-py3) so their behavior is unchanged.
Update tests (vllm-models.test.ts, detect-vllm-profile.test.ts) and docs (use-local-inference.mdx, commands.mdx) for the new Spark default and recognised slug.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: zyang-dev 267119621+zyang-dev@users.noreply.github.com

Summary by CodeRabbit

New Features
- Added support for Qwen3.6 35B-A3B NVFP4 as a managed vLLM model and made it the default for the Spark profile.
Documentation
- Updated model defaults, recognized model slugs, and environment-variable mapping guidance.
Tests
- Added/updated tests to cover the new model entry, default selections, and serving behavior.

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

coderabbitai · 2026-06-01T20:44:02Z

📝 Walkthrough

Walkthrough

This PR adds the Qwen3.6 35B-A3B NVFP4 model to the vLLM model registry, introduces per-model serve environment exports, updates platform-specific vLLM container images and defaults, adjusts Docker download/start invocations, and updates tests and docs to reflect the new Spark default.

Changes

Qwen3.6 35B-A3B NVFP4 Model Integration

Layer / File(s)	Summary
Model definition contract and registry entry `src/lib/inference/vllm-models.ts`	`VllmModelDef` interface adds optional `serveEnv` field for per-model environment exports; `VLLM_MODELS` registry gains Qwen3.6-35B-A3B-NVFP4 entry with model-specific serving flags and serveEnv configuration for FlashInfer/MoE backends; platform default comment updated to reference new Spark model.
Serve command generation with environment exports `src/lib/inference/vllm-models.ts`	`buildVllmServeCommand` now prepends `export key=value && ...` prefix derived from `model.serveEnv` before the pip install and vllm serve invocation, enabling per-model environment setup at serve time.
Platform-specific images and model defaults `src/lib/inference/vllm.ts`	Explicit vLLM image constants (upstream nightly and NGC) introduced; Spark profile configured with upstream image and new Qwen3.6 35B NVFP4 resolver; Station and generic Linux profiles switched to NGC image; platform-mapping comment updated.
Docker container invocation adjustments `src/lib/inference/vllm.ts`	`downloadModel()` now uses explicit `--entrypoint hf` for one-shot model download; `startContainer()` changed to invoke serve command via `/bin/bash -lc` with explicit entrypoint override instead of prior `bash -c` form.
Model registry and command generation tests `src/lib/inference/vllm-models.test.ts`	Test verifies `qwen3.6-35b-a3b-nvfp4` registry entry exists and is not gated; separate test asserts `buildVllmServeCommand` includes required serveEnv exports, fastsafetensors dependency, NVFP4-specific flags, and shared GPU defaults without utilization override.
Profile detection integration tests `test/detect-vllm-profile.test.ts`	Spark profile test assertions updated for new `defaultModel.id` and vLLM image tag; Station and generic Linux tests add assertions for NGC image usage.
Documentation updates for new model and slug `docs/inference/use-local-inference.mdx`, `docs/reference/commands.mdx`	Managed vLLM profile table updated to reflect Spark's new default model; recognised-slugs table and reference docs expanded to include `qwen3.6-35b-a3b-nvfp4` slug mapping; per-platform model defaults clarified.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers:

cv

🐰 A rabbit hops through the models with grace,
Adding Qwen's strength to the serving space,
Environments exported, containers aligned,
Platform defaults refined,
Tests passing, docs bright—the stage is set right! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: switching DGX Spark's managed-vLLM profile to use Qwen3.6-35B-A3B-NVFP4, which is the primary objective of this PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/spark-vllm-qwen3-6-35b-nvfp4

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…-6-35b-nvfp4

github-actions · 2026-06-01T21:09:24Z

🌿 Preview your docs: https://nvidia-preview-pr-4619.docs.buildwithfern.com/nemoclaw

github-actions · 2026-06-01T21:11:28Z

E2E Advisor Recommendation

Required E2E: onboard-inference-smoke-e2e
Optional E2E: gpu-e2e, inference-routing-e2e, docs-validation-e2e

Dispatch hint: onboard-inference-smoke-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required E2E

onboard-inference-smoke-e2e (low): This PR changes local inference onboarding behavior. This regression E2E verifies onboarding does not report success until the configured inference route can serve a real chat completion and surfaces actionable diagnostics on failure. It is the closest existing merge-blocking inference-onboard guard, though it does not exercise the managed vLLM container itself.

Optional E2E

gpu-e2e (high): Useful live confidence for local GPU inference onboarding, sandbox reachability, and in-sandbox inference on a GPU runner. It uses Ollama rather than vLLM, so it is adjacent coverage only for this PR's managed-vLLM changes.
inference-routing-e2e (medium): Validates gateway inference routing, credential isolation, and error classification around configured providers. It is useful because managed vLLM ultimately configures an inference route, but it does not validate the vLLM install/start command.
docs-validation-e2e (low): Checks documentation and CLI/reference drift after updates to local inference and command environment-variable docs.

New E2E recommendations

managed-vLLM DGX Spark install (high): No existing E2E appears to run NEMOCLAW_PROVIDER=install-vllm on a Spark-class host and verify the new nvidia/Qwen3.6-35B-A3B-NVFP4 default, upstream nightly image, hf download entrypoint, serve environment exports, vLLM readiness, /v1/models detection, and sandbox chat/completions through the configured local route.
- Suggested test: Add a DGX Spark managed-vLLM E2E job that runs non-interactive onboarding with NEMOCLAW_PROVIDER=install-vllm, asserts the selected default model/image, waits for vLLM readiness, and sends a real OpenClaw chat completion through the sandbox.
managed-vLLM profile matrix (medium): This PR intentionally diverges Spark from Station and generic Linux defaults. Existing unit tests cover profile selection, but there is no E2E that validates the profile-specific Docker image/model defaults on real or emulated hosts.
- Suggested test: Add an E2E or scenario job that exercises managed-vLLM profile selection for Spark, Station, and generic NVIDIA Linux, at least in dry-run/plan mode with assertions for image, default model, Docker flags, and command line.

Dispatch hint

Workflow: regression-e2e.yaml
jobs input: onboard-inference-smoke-e2e

github-actions · 2026-06-01T21:11:29Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

None. The behavioral changes are in the managed vLLM installer/model registry, but the current dispatchable scenario ROUTES table has no vLLM/install-vLLM scenario. Existing scenario routes cover cloud, OpenAI-compatible, Ollama GPU, platform, messaging, and lifecycle coverage, so no scenario E2E job would directly exercise this changed surface.

Optional scenario E2E

None.

Relevant changed files

src/lib/inference/vllm-models.ts
src/lib/inference/vllm.ts

github-actions · 2026-06-01T21:12:29Z

PR Review Advisor

Findings: 0 needs attention, 3 worth checking, 0 nice ideas
Top item: Digest-pin or otherwise verify the new Spark vLLM image

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Digest-pin or verify the new Spark vLLM nightly image (src/lib/inference/vllm.ts:57): The DGX Spark managed-vLLM path now pulls Docker Hub image `vllm/vllm-openai:nightly-1fc2cee50a09a094b9f2bbdfcb0ab0cadb536712`. Docker tags are mutable, and this image is then run with GPU access, `--ipc=host`, a mounted Hugging Face cache, and vLLM `--trust-remote-code`. That makes the image reference an installer trust boundary for a high-risk inference/network path.
- Recommendation: Use an immutable digest reference for the Spark image, or add an explicit image verification/update process and document why a mutable nightly tag is acceptable for this managed installer path.
- Evidence: `UPSTREAM_VLLM_IMAGE` is a tag string at `src/lib/inference/vllm.ts:57`; `SPARK_PROFILE.image` uses it at line 119. The profile retains `--gpus all`, `--ipc=host`, and the HF cache mount, while shared vLLM args include `--trust-remote-code` in `src/lib/inference/vllm-models.ts`.
Quote or validate registry values before composing the shell serve command (src/lib/inference/vllm-models.ts:253): `buildVllmServeCommand()` now prepends `serveEnv` as raw `export ${key}=${value}` fragments and then joins model ids and args into a command executed through `bash -lc`. The current registry values are code-controlled literals, so this is not a user-controlled injection path today, but it creates a fragile local trust boundary where a future model entry with shell metacharacters could alter the command inside a GPU-enabled container.
- Recommendation: Validate `serveEnv` keys against a strict environment-variable-name pattern and shell-quote values, or build the container command from argv-safe components. Add a regression test with metacharacters to prove registry additions cannot accidentally change shell structure.
- Evidence: `serveEnv` is rendered with `.map(([key, value]) => `export ${key}=${value}`)` at line 253, and the resulting command is passed to Docker via `--entrypoint /bin/bash ... -lc ${JSON.stringify(buildVllmServeCommand(model))}` in `src/lib/inference/vllm.ts:353`.
Add runtime validation for the changed vLLM image and entrypoint behavior (src/lib/inference/vllm.ts:260): The tests assert profile selection and generated command strings, but the changed behavior depends on external image/runtime properties: the new upstream image must contain `hf` and `/bin/bash`, `--entrypoint hf ... download` must work for both the upstream and NGC images, and the Spark NVFP4 flags/env must boot to the ready marker on DGX Spark. This is an infrastructure path where string-level unit tests are not enough to catch image or entrypoint drift.
- Recommendation: Add or reference targeted runtime validation for managed Spark vLLM startup, including the pre-download command and the `--entrypoint /bin/bash -lc` serve command. At minimum, add a unit-level test for the complete Docker argv/string composition and keep a DGX Spark smoke validation in the release checklist.
- Evidence: New tests in `vllm-models.test.ts` and `detect-vllm-profile.test.ts` cover registry/profile strings. Runtime-sensitive changes are at `src/lib/inference/vllm.ts:260` (`--entrypoint hf`) and `src/lib/inference/vllm.ts:353` (`--entrypoint /bin/bash ... -lc`).

🌱 Nice ideas

None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

## Summary - Add the missing `v0.0.57` release-notes section with links to the detailed docs pages for command, inference, onboarding, messaging, status, installer, and policy changes. - Remove public references to docs-skip terms from source docs and regenerate the NemoClaw user skills from the current Fern MDX docs. - Carry forward generated references for the per-agent documentation split, including Hermes-specific reference files. ## Source summary - #4615 and #4653 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Release notes now cover host-side `sessions` and `agents` commands plus `NEMOCLAW_EXTRA_AGENTS_JSON` secondary-agent baking. - #4163, #4204, #4611, #4619, and #4676 -> `docs/about/release-notes.mdx`, `docs/inference/use-local-inference.mdx`: Release notes now cover managed vLLM progress/readiness, DGX Spark model default changes, local Ollama streaming usage, and inference route divergence warnings. - #4267, #4601, #4609, #4642, #4645, and #4661 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Release notes now cover UFW auto-remediation, local-inference reachability gates, gateway reuse/binding, cancel rollback, and policy selection persistence. - #4577, #4582, #4607, and #4660 -> `docs/about/release-notes.mdx`, `docs/manage-sandboxes/messaging-channels.mdx`: Release notes now cover Slack validation, atomic `channels add`, WhatsApp QR diagnostics, and Slack placeholder normalization. - #4388, #4600, #4646, and #4647 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Release notes now cover status failure layers, paused-container hints, Docker-driver doctor behavior, and non-destructive stale-registry recovery. - #4569, #4579, and #4678 -> `docs/about/release-notes.mdx`, `docs/manage-sandboxes/lifecycle.mdx`, `docs/network-policy/integration-policy-examples.mdx`: Release notes now cover installer tag pinning, PyPI `uv` policy access, and observable Jira validation. - #4632 -> `.agents/skills/`: Regenerated user skills from the current per-agent docs source, including newly generated Hermes reference files. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `rg "permissive mode|shields down|shields up|shields status|config rotate-token|rotate-token" docs --glob "*.mdx"` - `rg "permissive mode|shields down|shields up|shields status|config rotate-token|rotate-token" .agents/skills --glob "*.md"` - `npm run docs` - `npm run build:cli` - Commit hooks: markdownlint, docs-to-skills verification, gitleaks, skills YAML, commitlint  ## Summary by CodeRabbit * **Documentation** * Restructured documentation to clearly distinguish OpenClaw and Hermes agent variants throughout user guides. * Enhanced security, credential storage, and deployment guidance with clearer setup flows. * Added Hermes plugin installation and ecosystem documentation. * Improved workspace, messaging, and policy management references with variant-specific command examples. * Refined troubleshooting and CLI reference sections for clarity.

feat(inference): add DGX Spark Qwen3.6-35B-A3B-NVFP4 vLLM support

cbd78b9

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

zyang-dev self-assigned this Jun 1, 2026

Merge remote-tracking branch 'origin/main' into feat/spark-vllm-qwen3…

5a4606f

…-6-35b-nvfp4

zyang-dev added v0.0.57 Release target Platform: DGX Spark provider: vllm vLLM local or hosted provider behavior labels Jun 1, 2026

zyang-dev changed the title ~~feat(inference): Switches the DGX Spark managed-vLLM profile to Qwen3.6-35B-A3B-NVFP4~~ feat(inference): switches the DGX Spark managed-vLLM profile to Qwen3.6-35B-A3B-NVFP4 Jun 1, 2026

cv approved these changes Jun 1, 2026

View reviewed changes

cv merged commit 9b40234 into main Jun 1, 2026
36 of 39 checks passed

cv deleted the feat/spark-vllm-qwen3-6-35b-nvfp4 branch June 1, 2026 22:16

wangericnv mentioned this pull request Jun 2, 2026

[DGX Spark][Inference] PR #4619 vLLM/NVFP4 path crashes on container start with "CUDA unknown error" #4658

Closed

coderabbitai Bot mentioned this pull request Jun 2, 2026

fix(inference): improve managed vLLM download and launch progress #4676

Merged

12 tasks

wscurran added the platform: dgx-spark Affects DGX Spark hardware or workflows label Jun 3, 2026

cv mentioned this pull request Jun 3, 2026

Add interactive managed-vLLM model picker and curated Spark options #4705

Closed

2 tasks

wscurran removed the Platform: DGX Spark label Jun 3, 2026

miyoungc mentioned this pull request Jun 3, 2026

docs: refresh 0.0.57 release docs #4716

Merged

This was referenced Jun 4, 2026

fix(vllm): raise Qwen3.6-35B-A3B-NVFP4 max-model-len to 131072 on Spark #4779

Merged

fix(vllm): switch DGX Spark from upstream nightly build to NGC vllm 26.05.post1 #4810

Merged

feat(inference): update DGX Station vLLM to DeepSeek V4 Flash #4867

Merged

wscurran added the feature PR adds or expands user-visible functionality label Jun 8, 2026

coderabbitai Bot mentioned this pull request Jun 9, 2026

feat(inference): add interactive managed-vLLM model picker #5038

Merged

12 tasks

Conversation

zyang-dev commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 1, 2026

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 1, 2026

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zyang-dev commented Jun 1, 2026 •

edited

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading