feat: add Qwen3.5 & GLM-4.7-Flash model support#2151
Merged
Conversation
3600bfa to
1a4f071
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Adds Qwen3.5 model integration across NeMo-RL’s vLLM generation path and test/recipe infrastructure, plus a few quality-of-life fixes for multimodal-capable models running text-only workloads.
Changes:
- Extend vLLM worker conditional-generation handling (Qwen3.5 + config-driven prefix caching) and clear multimodal caches on sleep (sync/async).
- Freeze visual encoder params for text-only training to avoid optimizer/checkpoint resume issues; propagate chat templates from tokenizer → processor when missing.
- Add Qwen3.5 GRPO/VLM recipes and test-suite scripts; update nightly/release suite lists and raise the nightly GPU-hour budget threshold.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
tests/unit/test_recipes_and_test_suites.py |
Raises nightly GPU-hour budget assertion threshold. |
tests/test_suites/vlm/vlm_grpo-qwen3.5-35ba3b-geo3k-2n8g-megatron-ep16.sh |
Adds a Qwen3.5 VLM GRPO test script (megatron). |
tests/test_suites/vlm/vlm_grpo-qwen3.5-35ba3b-geo3k-2n8g-automodel-ep16.sh |
Adds a Qwen3.5 VLM GRPO test script (automodel). |
tests/test_suites/release.txt |
Registers the Qwen3.5 DAPO GRPO run in the release suite. |
tests/test_suites/nightly.txt |
Registers Qwen3.5 text-only runs; comments out Qwen3.5 VLM runs due to known vLLM crash. |
tests/test_suites/llm/grpo-qwen3.5-35ba3b-dapo-4n8g-automodel.sh |
Adds a release-grade Qwen3.5 DAPO GRPO test script. |
tests/test_suites/llm/grpo-qwen3.5-35ba3b-2n8g-megatron-ep16.sh |
Adds a nightly Qwen3.5 GRPO test script (megatron). |
tests/test_suites/llm/grpo-qwen3.5-35ba3b-2n8g-automodel-ep16.sh |
Adds a nightly Qwen3.5 GRPO test script (automodel). |
tests/test_suites/disabled.txt |
Disables Qwen3.5 VLM scripts (and one moonlight script) to keep suites consistent. |
nemo_rl/models/generation/vllm/vllm_worker.py |
Adds Qwen3.5 architectures to conditional-generation special-casing; makes prefix caching configurable; clears multimodal sender cache during sleep. |
nemo_rl/models/generation/vllm/vllm_worker_async.py |
Clears multimodal sender cache during async sleep. |
nemo_rl/models/automodel/setup.py |
Freezes visual encoder params for text-only training to avoid optimizer resume issues. |
nemo_rl/algorithms/utils.py |
Copies chat_template from tokenizer to processor when processor lacks one. |
examples/configs/recipes/vlm/vlm_grpo-qwen3.5-35ba3b-geo3k-2n8g-megatron-ep16.yaml |
Adds megatron recipe for Qwen3.5 VLM GRPO. |
examples/configs/recipes/vlm/vlm_grpo-qwen3.5-35ba3b-geo3k-2n8g-automodel-ep16.yaml |
Adds automodel recipe for Qwen3.5 VLM GRPO. |
examples/configs/recipes/llm/grpo-qwen3.5-35ba3b-dapo-4n8g-automodel.yaml |
Adds automodel recipe for Qwen3.5 DAPO GRPO. |
examples/configs/recipes/llm/grpo-qwen3.5-35ba3b-2n8g-megatron-ep16.yaml |
Adds megatron recipe for Qwen3.5 GRPO nightly run. |
examples/configs/recipes/llm/grpo-qwen3.5-35ba3b-2n8g-automodel-ep16.yaml |
Adds automodel recipe for Qwen3.5 GRPO nightly run. |
docs/about/model-support.md |
Documents Qwen3.5 (and GLM) model support entries. |
Comments suppressed due to low confidence (1)
nemo_rl/models/automodel/setup.py:701
- Freezing the visual encoder via
requires_grad_(False)may not be sufficient to prevent optimizer checkpoint mismatches because the optimizer is still constructed withmodel.parameters()(includes frozen params). Consider filtering to trainable parameters when initializing the optimizer (or otherwise excludingvisual_moduleparams from param groups) so optimizers that allocate state eagerly don’t track the frozen visual parameters.
# Freeze visual encoder when not doing VLM training.
# Without this, the optimizer creates state entries for visual params that never
# receive gradients, causing a key mismatch when resuming from checkpoint.
# Note: visual encoder is nested under model.model (e.g. model.model.visual for
# Qwen3_5MoeForConditionalGeneration), not directly on model.
visual_module = getattr(getattr(model, "model", None), "visual", None) or getattr(
model, "visual", None
)
if not is_vlm and visual_module is not None:
for param in visual_module.parameters():
param.requires_grad_(False)
if rank == 0:
print("Froze visual encoder parameters for text-only training")
# CPU offload if needed
if cpu_offload:
# Move buffers to CPU for FSDP modules
for v in model.buffers():
v.data = v.data.to("cpu")
model = model.to("cpu")
# Initialize optimizer
optimizer = None
if init_optimizer:
optimizer_cls = get_class(config["optimizer"]["name"])
optimizer = optimizer_cls(model.parameters(), **config["optimizer"]["kwargs"])
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
|
/ok to test f0a9372 |
Contributor
Author
|
/ok to test 4668dec |
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: alexchiu <qiuzhaopeng@foxmail.com>
terrykong
previously approved these changes
Apr 7, 2026
Collaborator
|
/ok to test 51f5f11 |
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
terrykong
previously approved these changes
Apr 7, 2026
Collaborator
|
/ok to test 557b9ca |
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
Collaborator
|
/ok to test bfaf018 |
terrykong
previously approved these changes
Apr 7, 2026
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
Collaborator
|
/ok to test 073467f |
terrykong
approved these changes
Apr 7, 2026
ZhiyuLi-Nvidia
added a commit
that referenced
this pull request
May 19, 2026
…hat pin ratio=2 PR #2325 (TypedDict→BaseModel migration) changed the default in examples/configs/grpo_math_1B.yaml from ``truncated_importance_sampling_type: tis`` to ``null``. Four recipes from PR #2151 (Qwen3.5 + GLM-4.7-Flash) inherit that parent, override ``truncated_importance_sampling_ratio: 2`` (committing to truncated IS), but do not set ``truncated_importance_sampling_type``. With the new ``null`` default they hit ValueError: Invalid truncated importance sampling type: None at the first loss call (loss_functions.py:530 — the ``ratio is not None`` branch enters but no type matches "tis"/"icepop"/"seq-mask-tis"). Set the type explicitly on each affected recipe rather than restoring the base default — base ``grpo_math_1B.yaml`` does not enable truncated IS (``ratio: null``), so its ``type`` value is moot and should stay decoupled. Surfaced by short nightly sweep job 11899306 (qwen3.5-35ba3b-dapo) on commit 9cbecb8. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
ZhiyuLi-Nvidia
added a commit
that referenced
this pull request
May 19, 2026
…hat pin ratio=2 PR #2325 (TypedDict→BaseModel migration) changed the default in examples/configs/grpo_math_1B.yaml from ``truncated_importance_sampling_type: tis`` to ``null``. Four recipes from PR #2151 (Qwen3.5 + GLM-4.7-Flash) inherit that parent, override ``truncated_importance_sampling_ratio: 2`` (committing to truncated IS), but do not set ``truncated_importance_sampling_type``. With the new ``null`` default they hit ValueError: Invalid truncated importance sampling type: None at the first loss call (loss_functions.py:530 — the ``ratio is not None`` branch enters but no type matches "tis"/"icepop"/"seq-mask-tis"). Set the type explicitly on each affected recipe rather than restoring the base default — base ``grpo_math_1B.yaml`` does not enable truncated IS (``ratio: null``), so its ``type`` value is moot and should stay decoupled. Surfaced by short nightly sweep job 11899306 (qwen3.5-35ba3b-dapo) on commit 9cbecb8. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
ZhiyuLi-Nvidia
added a commit
that referenced
this pull request
May 22, 2026
…hat pin ratio=2 PR #2325 (TypedDict→BaseModel migration) changed the default in examples/configs/grpo_math_1B.yaml from ``truncated_importance_sampling_type: tis`` to ``null``. Four recipes from PR #2151 (Qwen3.5 + GLM-4.7-Flash) inherit that parent, override ``truncated_importance_sampling_ratio: 2`` (committing to truncated IS), but do not set ``truncated_importance_sampling_type``. With the new ``null`` default they hit ValueError: Invalid truncated importance sampling type: None at the first loss call (loss_functions.py:530 — the ``ratio is not None`` branch enters but no type matches "tis"/"icepop"/"seq-mask-tis"). Set the type explicitly on each affected recipe rather than restoring the base default — base ``grpo_math_1B.yaml`` does not enable truncated IS (``ratio: null``), so its ``type`` value is moot and should stay decoupled. Surfaced by short nightly sweep job 11899306 (qwen3.5-35ba3b-dapo) on commit 9cbecb8. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
ZhiyuLi-Nvidia
added a commit
that referenced
this pull request
May 23, 2026
…hat pin ratio=2 PR #2325 (TypedDict→BaseModel migration) changed the default in examples/configs/grpo_math_1B.yaml from ``truncated_importance_sampling_type: tis`` to ``null``. Four recipes from PR #2151 (Qwen3.5 + GLM-4.7-Flash) inherit that parent, override ``truncated_importance_sampling_ratio: 2`` (committing to truncated IS), but do not set ``truncated_importance_sampling_type``. With the new ``null`` default they hit ValueError: Invalid truncated importance sampling type: None at the first loss call (loss_functions.py:530 — the ``ratio is not None`` branch enters but no type matches "tis"/"icepop"/"seq-mask-tis"). Set the type explicitly on each affected recipe rather than restoring the base default — base ``grpo_math_1B.yaml`` does not enable truncated IS (``ratio: null``), so its ``type`` value is moot and should stay decoupled. Surfaced by short nightly sweep job 11899306 (qwen3.5-35ba3b-dapo) on commit 9cbecb8. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Offline Experiments
https://wandb.ai/ys_fishcool-nvidia/nemorl-qwen35?nw=nwuserys_fishcool
Known Issues
pip install flash-linear-attention.