feat: add Qwen3.5 & GLM-4.7-Flash model support by zpqiu · Pull Request #2151 · NVIDIA-NeMo/RL

zpqiu · 2026-03-25T02:16:37Z

Summary

Add Qwen3.5 architecture support (Qwen3_5ForConditionalGeneration, Qwen3_5MoeForConditionalGeneration) to vLLM conditional generation handling
Freeze visual encoder for text-only training of VLM-capable models to prevent optimizer state key mismatch on checkpoint resume
Copy chat_template from tokenizer to processor for models whose processor lacks its own template (e.g. Qwen3.5)
Make enable_prefix_caching configurable via vllm_cfg
Clear multimodal processor cache on sleep to prevent sender/receiver cache desync (sync and async paths)

Offline Experiments

https://wandb.ai/ys_fishcool-nvidia/nemorl-qwen35?nw=nwuserys_fishcool

Model	Backend	GRPO Validated?	Comments
Qwen3.5-2B/4B	Automodel	Yes
Qwen3.5-35B-A3B	Automodel	Yes
Qwen3.5-35B-A3B (VLM task)	Automodel	Yes
Qwen3.5-397B-A17B	Automodel	No	Need Automodel PP
Qwen3.5-2B/4B	MCore	Yes with issue	Will work after next Megatron-Bridge verion bump, Related issue: NVIDIA-NeMo/Megatron-Bridge#3112
Qwen3.5-9B	MCore	Yes
Qwen3.5-35B-A3B	MCore	Yes
Qwen3.5-35B-A3B (VLM task)	MCore	Yes
Qwen3.5-397B-A17B	MCore	Yes	Problem only happens when vllm has EP enabled + TP > 1 + DP > 1. Garbled output is generated, see issue: vllm-project/vllm#37856
GLM-4.7-Flash	Automodel	Yes

Known Issues

By default, NeMo Automodel does not install the FLA dependency. To use CP on the DTensor V2 path, you need to install it separately with pip install flash-linear-attention.
In our offline Qwen3.5 experiments, we occasionally encounter this known vLLM issue, and it occurs more frequently in multimodal training tasks. [Bug]: Generation hangs until RAY_CGRAPH_get_timeout (300s) with Ray compiled DAG executor vllm-project/vllm#36237
Sequence packing on DTensor V2 path may have this issue: Sequence Packing Bug: cu_seqlens not propagated to AutoModel #2105
Training small Qwen3.5 dense models (2B/4B) on the Megatron-Bridge backend may have this issue: [bug] Qwen3.5 dense small models (2B/4B) crash due to shadow embedding breaking tied embeddings Megatron-Bridge#3112

copy-pr-bot · 2026-03-25T02:16:41Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copilot

Pull request overview

Adds Qwen3.5 model integration across NeMo-RL’s vLLM generation path and test/recipe infrastructure, plus a few quality-of-life fixes for multimodal-capable models running text-only workloads.

Changes:

Extend vLLM worker conditional-generation handling (Qwen3.5 + config-driven prefix caching) and clear multimodal caches on sleep (sync/async).
Freeze visual encoder params for text-only training to avoid optimizer/checkpoint resume issues; propagate chat templates from tokenizer → processor when missing.
Add Qwen3.5 GRPO/VLM recipes and test-suite scripts; update nightly/release suite lists and raise the nightly GPU-hour budget threshold.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`tests/unit/test_recipes_and_test_suites.py`	Raises nightly GPU-hour budget assertion threshold.
`tests/test_suites/vlm/vlm_grpo-qwen3.5-35ba3b-geo3k-2n8g-megatron-ep16.sh`	Adds a Qwen3.5 VLM GRPO test script (megatron).
`tests/test_suites/vlm/vlm_grpo-qwen3.5-35ba3b-geo3k-2n8g-automodel-ep16.sh`	Adds a Qwen3.5 VLM GRPO test script (automodel).
`tests/test_suites/release.txt`	Registers the Qwen3.5 DAPO GRPO run in the release suite.
`tests/test_suites/nightly.txt`	Registers Qwen3.5 text-only runs; comments out Qwen3.5 VLM runs due to known vLLM crash.
`tests/test_suites/llm/grpo-qwen3.5-35ba3b-dapo-4n8g-automodel.sh`	Adds a release-grade Qwen3.5 DAPO GRPO test script.
`tests/test_suites/llm/grpo-qwen3.5-35ba3b-2n8g-megatron-ep16.sh`	Adds a nightly Qwen3.5 GRPO test script (megatron).
`tests/test_suites/llm/grpo-qwen3.5-35ba3b-2n8g-automodel-ep16.sh`	Adds a nightly Qwen3.5 GRPO test script (automodel).
`tests/test_suites/disabled.txt`	Disables Qwen3.5 VLM scripts (and one moonlight script) to keep suites consistent.
`nemo_rl/models/generation/vllm/vllm_worker.py`	Adds Qwen3.5 architectures to conditional-generation special-casing; makes prefix caching configurable; clears multimodal sender cache during sleep.
`nemo_rl/models/generation/vllm/vllm_worker_async.py`	Clears multimodal sender cache during async sleep.
`nemo_rl/models/automodel/setup.py`	Freezes visual encoder params for text-only training to avoid optimizer resume issues.
`nemo_rl/algorithms/utils.py`	Copies `chat_template` from tokenizer to processor when processor lacks one.
`examples/configs/recipes/vlm/vlm_grpo-qwen3.5-35ba3b-geo3k-2n8g-megatron-ep16.yaml`	Adds megatron recipe for Qwen3.5 VLM GRPO.
`examples/configs/recipes/vlm/vlm_grpo-qwen3.5-35ba3b-geo3k-2n8g-automodel-ep16.yaml`	Adds automodel recipe for Qwen3.5 VLM GRPO.
`examples/configs/recipes/llm/grpo-qwen3.5-35ba3b-dapo-4n8g-automodel.yaml`	Adds automodel recipe for Qwen3.5 DAPO GRPO.
`examples/configs/recipes/llm/grpo-qwen3.5-35ba3b-2n8g-megatron-ep16.yaml`	Adds megatron recipe for Qwen3.5 GRPO nightly run.
`examples/configs/recipes/llm/grpo-qwen3.5-35ba3b-2n8g-automodel-ep16.yaml`	Adds automodel recipe for Qwen3.5 GRPO nightly run.
`docs/about/model-support.md`	Documents Qwen3.5 (and GLM) model support entries.

Comments suppressed due to low confidence (1)

nemo_rl/models/automodel/setup.py:701

Freezing the visual encoder via requires_grad_(False) may not be sufficient to prevent optimizer checkpoint mismatches because the optimizer is still constructed with model.parameters() (includes frozen params). Consider filtering to trainable parameters when initializing the optimizer (or otherwise excluding visual_module params from param groups) so optimizers that allocate state eagerly don’t track the frozen visual parameters.

    # Freeze visual encoder when not doing VLM training.
    # Without this, the optimizer creates state entries for visual params that never
    # receive gradients, causing a key mismatch when resuming from checkpoint.
    # Note: visual encoder is nested under model.model (e.g. model.model.visual for
    # Qwen3_5MoeForConditionalGeneration), not directly on model.
    visual_module = getattr(getattr(model, "model", None), "visual", None) or getattr(
        model, "visual", None
    )
    if not is_vlm and visual_module is not None:
        for param in visual_module.parameters():
            param.requires_grad_(False)
        if rank == 0:
            print("Froze visual encoder parameters for text-only training")

    # CPU offload if needed
    if cpu_offload:
        # Move buffers to CPU for FSDP modules
        for v in model.buffers():
            v.data = v.data.to("cpu")
        model = model.to("cpu")

    # Initialize optimizer
    optimizer = None
    if init_optimizer:
        optimizer_cls = get_class(config["optimizer"]["name"])
        optimizer = optimizer_cls(model.parameters(), **config["optimizer"]["kwargs"])

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

yuki-97 · 2026-04-02T08:36:03Z

/ok to test f0a9372

zpqiu · 2026-04-02T09:17:16Z

/ok to test 4668dec

Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: alexchiu <qiuzhaopeng@foxmail.com>

terrykong · 2026-04-07T06:51:46Z

/ok to test 51f5f11

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

terrykong · 2026-04-07T07:01:13Z

/ok to test 557b9ca

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

terrykong · 2026-04-07T07:02:15Z

/ok to test bfaf018

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

terrykong · 2026-04-07T07:17:42Z

/ok to test 073467f

…hat pin ratio=2 PR #2325 (TypedDict→BaseModel migration) changed the default in examples/configs/grpo_math_1B.yaml from ``truncated_importance_sampling_type: tis`` to ``null``. Four recipes from PR #2151 (Qwen3.5 + GLM-4.7-Flash) inherit that parent, override ``truncated_importance_sampling_ratio: 2`` (committing to truncated IS), but do not set ``truncated_importance_sampling_type``. With the new ``null`` default they hit ValueError: Invalid truncated importance sampling type: None at the first loss call (loss_functions.py:530 — the ``ratio is not None`` branch enters but no type matches "tis"/"icepop"/"seq-mask-tis"). Set the type explicitly on each affected recipe rather than restoring the base default — base ``grpo_math_1B.yaml`` does not enable truncated IS (``ratio: null``), so its ``type`` value is moot and should stay decoupled. Surfaced by short nightly sweep job 11899306 (qwen3.5-35ba3b-dapo) on commit 9cbecb8. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>

zpqiu linked an issue Mar 25, 2026 that may be closed by this pull request

Adding support for Qwen3.5-35B #2025

Closed

zpqiu force-pushed the qwen35 branch from 7852acc to ea948e9 Compare March 31, 2026 08:03

zpqiu requested a review from sharonyu-115 March 31, 2026 10:48

github-actions Bot added the Documentation Improvements or additions to documentation label Apr 2, 2026

zpqiu force-pushed the qwen35 branch 3 times, most recently from 3600bfa to 1a4f071 Compare April 2, 2026 05:39

zpqiu marked this pull request as ready for review April 2, 2026 07:36

zpqiu requested review from a team as code owners April 2, 2026 07:36

zpqiu requested a review from Copilot April 2, 2026 07:38

Copilot started reviewing on behalf of zpqiu April 2, 2026 07:39 View session

zpqiu added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Apr 2, 2026

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Comment thread tests/unit/test_recipes_and_test_suites.py Outdated

Comment thread nemo_rl/models/generation/vllm/vllm_worker.py

Comment thread nemo_rl/models/generation/vllm/vllm_worker.py Outdated

Comment thread nemo_rl/algorithms/utils.py

copy-pr-bot Bot temporarily deployed to nemo-ci April 2, 2026 08:36 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 2, 2026 09:17 Inactive

zpqiu force-pushed the qwen35 branch from 0d70e63 to 75f6b75 Compare April 2, 2026 09:38

zpqiu linked an issue Apr 2, 2026 that may be closed by this pull request

Adding support for GLM-4.7-Flash #2026

Closed

zpqiu changed the title ~~feat: add Qwen3.5 model support~~ feat: add Qwen3.5 & GLM-4.7-Flash model support Apr 2, 2026

zpqiu requested a review from terrykong April 2, 2026 10:23

Update nemo_rl/models/policy/utils.py

51f5f11

Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Signed-off-by: alexchiu <qiuzhaopeng@foxmail.com>

terrykong previously approved these changes Apr 7, 2026

View reviewed changes

terrykong enabled auto-merge (squash) April 7, 2026 06:51

copy-pr-bot Bot had a problem deploying to nemo-ci April 7, 2026 06:52 Failure

copy-pr-bot Bot temporarily deployed to nemo-ci April 7, 2026 06:52 Inactive

docs: add Qwen3.5 and GLM-4.7-Flash model support announcement

557b9ca

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

zpqiu dismissed terrykong’s stale review via 557b9ca April 7, 2026 06:59

zpqiu requested a review from a team as a code owner April 7, 2026 06:59

terrykong previously approved these changes Apr 7, 2026

View reviewed changes

fix: correct Qwen3.5 and GLM-4.7-Flash HuggingFace links in README

bfaf018

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

zpqiu dismissed terrykong’s stale review via bfaf018 April 7, 2026 07:01

terrykong previously approved these changes Apr 7, 2026

View reviewed changes

copy-pr-bot Bot had a problem deploying to nemo-ci April 7, 2026 07:03 Error

copy-pr-bot Bot temporarily deployed to nemo-ci April 7, 2026 07:03 Inactive

chore: minimize Qwen3.5 recipe configs to remove redundant defaults

073467f

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

zpqiu dismissed terrykong’s stale review via 073467f April 7, 2026 07:12

zpqiu requested a review from terrykong April 7, 2026 07:12

terrykong approved these changes Apr 7, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to nemo-ci April 7, 2026 07:18 Inactive

terrykong merged commit 909834e into NVIDIA-NeMo:main Apr 7, 2026
27 checks passed

anwithk mentioned this pull request Apr 30, 2026

Add GLM-5.1 support in NeMo RL #2377

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Qwen3.5 & GLM-4.7-Flash model support#2151

feat: add Qwen3.5 & GLM-4.7-Flash model support#2151
terrykong merged 23 commits into
NVIDIA-NeMo:mainfrom
zpqiu:qwen35

zpqiu commented Mar 25, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuki-97 commented Apr 2, 2026

Uh oh!

zpqiu commented Apr 2, 2026

Uh oh!

terrykong commented Apr 7, 2026

Uh oh!

terrykong commented Apr 7, 2026

Uh oh!

terrykong commented Apr 7, 2026

Uh oh!

terrykong commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

zpqiu commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Offline Experiments

Known Issues

Uh oh!

copy-pr-bot Bot commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuki-97 commented Apr 2, 2026

Uh oh!

zpqiu commented Apr 2, 2026

Uh oh!

terrykong commented Apr 7, 2026

Uh oh!

terrykong commented Apr 7, 2026

Uh oh!

terrykong commented Apr 7, 2026

Uh oh!

terrykong commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

zpqiu commented Mar 25, 2026 •

edited

Loading