fix(vllm): raise Qwen3.6-35B-A3B-NVFP4 max-model-len to 131072 on Spark by zyang-dev · Pull Request #4779 · NVIDIA/NemoClaw

zyang-dev · 2026-06-04T18:41:30Z

Summary

Updates the DGX Spark managed vLLM profile for the Qwen3.6 35B-A3B NVFP4 model to use a 128K context window.

Changes

Changed the Spark NVFP4 vLLM --max-model-len from 65536 to 131072.
Update the registry test to assert --max-model-len 131072.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: zyang-dev 267119621+zyang-dev@users.noreply.github.com

Summary by CodeRabbit

Bug Fixes
- Updated the maximum model length configuration for the Qwen3.6-35B-A3B-NVFP4 model from 65536 to 131072.

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

coderabbitai · 2026-06-04T18:41:43Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 955372cd-d39e-4fc1-96e3-a4b1b9f565f2

📥 Commits

Reviewing files that changed from the base of the PR and between af7105c and 8f7c9c4.

📒 Files selected for processing (2)

src/lib/inference/vllm-models.test.ts
src/lib/inference/vllm-models.ts

📝 Walkthrough

Walkthrough

The vLLM model registry entry for nvidia/Qwen3.6-35B-A3B-NVFP4 doubles its maxModelLen from 65536 to 131072, increasing the --max-model-len serve command flag. The test is updated to verify the new expected value.

Changes

Model Configuration and Test Update

Layer / File(s)	Summary
Increase max model length and update test `src/lib/inference/vllm-models.ts`, `src/lib/inference/vllm-models.test.ts`	The `VLLM_MODELS` registry entry for `nvidia/Qwen3.6-35B-A3B-NVFP4` updates `maxModelLen` from `65536` to `131072`, and the corresponding test assertion is updated to expect the new `--max-model-len` flag value.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

NVIDIA/NemoClaw#4619: Both PRs update the vLLM model registry/test for the same qwen3.6-35b-a3b-nvfp4 model (in src/lib/inference/vllm-models.ts and vllm-models.test.ts), with the main PR specifically adjusting its maxModelLen (i.e., --max-model-len) to a new value.
NVIDIA/NemoClaw#4689: The main PR increases the vLLM --max-model-len for qwen3.6-35b-a3b-nvfp4 (thereby changing what /v1/models.max_model_len reports), and the retrieved PR reads that runtime max_model_len and applies it to NEMOCLAW_CONTEXT_WINDOW, so they directly connect through the same max_model_len value.

Suggested labels

Platform: DGX Spark, Provider: vLLM

Suggested reviewers

cv

Poem

🐰 A model grows twice as tall,
From 65K to twice that, all!
The test now cheers, 131072 bright,
Qwen's context expanded to new height. ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly describes the main change: updating the max-model-len value for the Qwen3.6-35B-A3B-NVFP4 model from 65536 to 131072, which aligns with modifications in both the vllm-models.ts and vllm-models.test.ts files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/spark-vllm-context-window

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-04T18:44:01Z

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Top item: No actionable findings

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

github-actions · 2026-06-04T18:44:23Z

E2E Advisor Recommendation

Required E2E: inference-routing-e2e
Optional E2E: gpu-e2e, onboard-inference-smoke-e2e

Dispatch hint: inference-routing-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required E2E

inference-routing-e2e (medium): The PR changes inference-serving configuration. This is the closest existing merge-blocking E2E signal for assistant inference paths: it validates onboarding-created inference routes, inference.local access from the sandbox, OpenAI-compatible chat completion routing, and error classification.

Optional E2E

gpu-e2e (high): Useful adjacent confidence for local GPU inference onboarding and sandbox inference behavior, although it exercises Ollama rather than managed vLLM and will not verify the changed --max-model-len value.
onboard-inference-smoke-e2e (low): Adjacent regression coverage that setupInference fails unless the configured route serves a real chat completion. It does not exercise managed vLLM startup but is relevant to onboarding/inference success semantics.

New E2E recommendations

managed vLLM DGX Spark install (high): No existing E2E appears to start the managed vLLM container with NEMOCLAW_VLLM_MODEL=qwen3.6-35b-a3b-nvfp4 and assert the generated vllm serve command includes --max-model-len 131072, reaches /v1/models, and serves a sandbox chat completion through inference.local. The current change is only directly covered by unit tests.
- Suggested test: Add a DGX Spark/NVIDIA GPU managed-vLLM E2E job that runs non-interactive onboarding with NEMOCLAW_PROVIDER=install-vllm and NEMOCLAW_VLLM_MODEL=qwen3.6-35b-a3b-nvfp4, then verifies container readiness, command flags, and a real OpenAI-compatible chat completion from inside the sandbox.

Dispatch hint

Workflow: .github/workflows/nightly-e2e.yaml
jobs input: inference-routing-e2e

github-actions · 2026-06-04T18:44:24Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

None. The PR changes the vLLM model registry and its unit tests outside test/e2e-scenario/. The current dispatchable scenario ROUTES do not include a managed-vLLM or DGX Spark scenario that would exercise NEMOCLAW_VLLM_MODEL or the updated max-model-len for the NVFP4 checkpoint; existing scenario inference coverage is cloud NVIDIA/OpenAI-compatible or local Ollama.

Optional scenario E2E

None.

Relevant changed files

None.

## Summary - Add the v0.0.59 release notes from the GitHub announcement discussion. - Refresh local inference and credential-storage guidance for the current release behavior. - Regenerate the user skills from the updated Fern docs. - Tighten release-prep and docs review guidance for generated skills, PR labels, and shared `$$nemoclaw` command placeholders. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `rg "permissive mode|shields down|shields up|shields status|config rotate-token|rotate-token" --glob '*.{md,mdx}'` - `git diff --check` - `npm run docs` (rerun outside sandbox after sandbox-only `tsx` IPC permission failure) - `npm run typecheck:cli` - Pre-commit hooks during commit passed, including markdownlint, docs-to-skills verification, gitleaks, commitlint, and skills YAML tests. ## Source Summary - #3679, #4437, #4681, #4766, #4772, #4775, #4786 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`, `docs/reference/troubleshooting.mdx`: Summarize OpenClaw 2026.5.27 compatibility, runtime path pinning, plugin registry recovery, live gateway reconciliation, and clearer host-alias/startup diagnostics. - #4332, #4402, #4769, #4776, #4779 -> `docs/about/release-notes.mdx`, `docs/inference/inference-options.mdx`, `docs/inference/use-local-inference.mdx`, `docs/inference/switch-inference-providers.mdx`: Document the release inference changes covering Local NIM waits, Hermes Anthropic routing, Nemotron 3 Ultra, the current Ollama starter fallback, and Spark managed-vLLM context length. - #4628, #4652, #4733, #4745 -> `docs/about/release-notes.mdx`, `docs/security/credential-storage.mdx`, `docs/manage-sandboxes/messaging-channels.mdx`, `docs/reference/troubleshooting.mdx`: Capture permission healing, gateway-stored credential reuse, cross-sandbox messaging credential conflict checks, and CDI preflight diagnostics. - #4728, #4737, #4743, #4744, #4782 -> `.agents/skills/nemoclaw-user-*`: Regenerate the user skill references from the updated source docs. - Follow-up maintenance -> `.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`, `.coderabbit.yaml`: Add release-prep area labels for docs and skills PRs, and teach docs review guidance that `$$nemoclaw` is the correct shared command placeholder for examples that work across agent aliases. Note: the `documentation` label was not present in the repository, so this PR is labeled with `v0.0.59` only.  ## Summary by CodeRabbit * **Documentation** * Updated default model for local Ollama inference setup to qwen3.5:9b * Added Nemotron 3 Ultra 550B as an NVIDIA Endpoints model option * Clarified credential storage and reuse behavior for post-deployment (day-two) operations * Added v0.0.59 release notes covering OpenClaw compatibility, inference options, Hermes messaging sync, and troubleshooting * Clarified CLI selection guidance and updated OpenClaw version example in status output * Revised release-prep instructions and docs review guidance for CLI alias usage

fix(vllm): raise Qwen3.6-35B-A3B-NVFP4 max-model-len to 131072 on Spark

8f7c9c4

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

zyang-dev added the v0.0.59 Release target label Jun 4, 2026

cv approved these changes Jun 4, 2026

View reviewed changes

cv merged commit 88f8d66 into main Jun 4, 2026
32 checks passed

cv deleted the fix/spark-vllm-context-window branch June 4, 2026 20:50

miyoungc mentioned this pull request Jun 5, 2026

docs: refresh 0.0.59 release notes #4790

Merged

wscurran added area: inference Inference routing, serving, model selection, or outputs bug-fix PR fixes a bug or regression platform: dgx-spark Affects DGX Spark hardware or workflows provider: vllm vLLM local or hosted provider behavior labels Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(vllm): raise Qwen3.6-35B-A3B-NVFP4 max-model-len to 131072 on Spark#4779

fix(vllm): raise Qwen3.6-35B-A3B-NVFP4 max-model-len to 131072 on Spark#4779
cv merged 1 commit into
mainfrom
fix/spark-vllm-context-window

zyang-dev commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 4, 2026

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zyang-dev commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented Jun 4, 2026

PR Review Advisor

Uh oh!

github-actions Bot commented Jun 4, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 4, 2026

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zyang-dev commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading