Skip to content

feat: add Qwen3.5 & GLM-4.7-Flash model support#2151

Merged
terrykong merged 23 commits into
NVIDIA-NeMo:mainfrom
zpqiu:qwen35
Apr 7, 2026
Merged

feat: add Qwen3.5 & GLM-4.7-Flash model support#2151
terrykong merged 23 commits into
NVIDIA-NeMo:mainfrom
zpqiu:qwen35

Conversation

@zpqiu

@zpqiu zpqiu commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add Qwen3.5 architecture support (Qwen3_5ForConditionalGeneration, Qwen3_5MoeForConditionalGeneration) to vLLM conditional generation handling
  • Freeze visual encoder for text-only training of VLM-capable models to prevent optimizer state key mismatch on checkpoint resume
  • Copy chat_template from tokenizer to processor for models whose processor lacks its own template (e.g. Qwen3.5)
  • Make enable_prefix_caching configurable via vllm_cfg
  • Clear multimodal processor cache on sleep to prevent sender/receiver cache desync (sync and async paths)

Offline Experiments

https://wandb.ai/ys_fishcool-nvidia/nemorl-qwen35?nw=nwuserys_fishcool

Model Backend GRPO Validated? Comments
Qwen3.5-2B/4B Automodel Yes  
Qwen3.5-35B-A3B Automodel Yes
Qwen3.5-35B-A3B (VLM task) Automodel Yes  
Qwen3.5-397B-A17B Automodel No Need Automodel PP
Qwen3.5-2B/4B MCore Yes with issue Will work after next Megatron-Bridge verion bump, Related issue: NVIDIA-NeMo/Megatron-Bridge#3112
Qwen3.5-9B MCore Yes  
Qwen3.5-35B-A3B MCore Yes  
Qwen3.5-35B-A3B (VLM task) MCore Yes  
Qwen3.5-397B-A17B MCore Yes Problem only happens when vllm has EP enabled + TP > 1 + DP > 1. Garbled output is generated, see issue: vllm-project/vllm#37856
GLM-4.7-Flash Automodel Yes  

Known Issues

@copy-pr-bot

copy-pr-bot Bot commented Mar 25, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@zpqiu zpqiu linked an issue Mar 25, 2026 that may be closed by this pull request
@zpqiu zpqiu requested a review from sharonyu-115 March 31, 2026 10:48
@github-actions github-actions Bot added the Documentation Improvements or additions to documentation label Apr 2, 2026
@zpqiu zpqiu force-pushed the qwen35 branch 3 times, most recently from 3600bfa to 1a4f071 Compare April 2, 2026 05:39
@zpqiu zpqiu marked this pull request as ready for review April 2, 2026 07:36
@zpqiu zpqiu requested review from a team as code owners April 2, 2026 07:36
@zpqiu zpqiu requested a review from Copilot April 2, 2026 07:38
@zpqiu zpqiu added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Apr 2, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Qwen3.5 model integration across NeMo-RL’s vLLM generation path and test/recipe infrastructure, plus a few quality-of-life fixes for multimodal-capable models running text-only workloads.

Changes:

  • Extend vLLM worker conditional-generation handling (Qwen3.5 + config-driven prefix caching) and clear multimodal caches on sleep (sync/async).
  • Freeze visual encoder params for text-only training to avoid optimizer/checkpoint resume issues; propagate chat templates from tokenizer → processor when missing.
  • Add Qwen3.5 GRPO/VLM recipes and test-suite scripts; update nightly/release suite lists and raise the nightly GPU-hour budget threshold.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/unit/test_recipes_and_test_suites.py Raises nightly GPU-hour budget assertion threshold.
tests/test_suites/vlm/vlm_grpo-qwen3.5-35ba3b-geo3k-2n8g-megatron-ep16.sh Adds a Qwen3.5 VLM GRPO test script (megatron).
tests/test_suites/vlm/vlm_grpo-qwen3.5-35ba3b-geo3k-2n8g-automodel-ep16.sh Adds a Qwen3.5 VLM GRPO test script (automodel).
tests/test_suites/release.txt Registers the Qwen3.5 DAPO GRPO run in the release suite.
tests/test_suites/nightly.txt Registers Qwen3.5 text-only runs; comments out Qwen3.5 VLM runs due to known vLLM crash.
tests/test_suites/llm/grpo-qwen3.5-35ba3b-dapo-4n8g-automodel.sh Adds a release-grade Qwen3.5 DAPO GRPO test script.
tests/test_suites/llm/grpo-qwen3.5-35ba3b-2n8g-megatron-ep16.sh Adds a nightly Qwen3.5 GRPO test script (megatron).
tests/test_suites/llm/grpo-qwen3.5-35ba3b-2n8g-automodel-ep16.sh Adds a nightly Qwen3.5 GRPO test script (automodel).
tests/test_suites/disabled.txt Disables Qwen3.5 VLM scripts (and one moonlight script) to keep suites consistent.
nemo_rl/models/generation/vllm/vllm_worker.py Adds Qwen3.5 architectures to conditional-generation special-casing; makes prefix caching configurable; clears multimodal sender cache during sleep.
nemo_rl/models/generation/vllm/vllm_worker_async.py Clears multimodal sender cache during async sleep.
nemo_rl/models/automodel/setup.py Freezes visual encoder params for text-only training to avoid optimizer resume issues.
nemo_rl/algorithms/utils.py Copies chat_template from tokenizer to processor when processor lacks one.
examples/configs/recipes/vlm/vlm_grpo-qwen3.5-35ba3b-geo3k-2n8g-megatron-ep16.yaml Adds megatron recipe for Qwen3.5 VLM GRPO.
examples/configs/recipes/vlm/vlm_grpo-qwen3.5-35ba3b-geo3k-2n8g-automodel-ep16.yaml Adds automodel recipe for Qwen3.5 VLM GRPO.
examples/configs/recipes/llm/grpo-qwen3.5-35ba3b-dapo-4n8g-automodel.yaml Adds automodel recipe for Qwen3.5 DAPO GRPO.
examples/configs/recipes/llm/grpo-qwen3.5-35ba3b-2n8g-megatron-ep16.yaml Adds megatron recipe for Qwen3.5 GRPO nightly run.
examples/configs/recipes/llm/grpo-qwen3.5-35ba3b-2n8g-automodel-ep16.yaml Adds automodel recipe for Qwen3.5 GRPO nightly run.
docs/about/model-support.md Documents Qwen3.5 (and GLM) model support entries.
Comments suppressed due to low confidence (1)

nemo_rl/models/automodel/setup.py:701

  • Freezing the visual encoder via requires_grad_(False) may not be sufficient to prevent optimizer checkpoint mismatches because the optimizer is still constructed with model.parameters() (includes frozen params). Consider filtering to trainable parameters when initializing the optimizer (or otherwise excluding visual_module params from param groups) so optimizers that allocate state eagerly don’t track the frozen visual parameters.
    # Freeze visual encoder when not doing VLM training.
    # Without this, the optimizer creates state entries for visual params that never
    # receive gradients, causing a key mismatch when resuming from checkpoint.
    # Note: visual encoder is nested under model.model (e.g. model.model.visual for
    # Qwen3_5MoeForConditionalGeneration), not directly on model.
    visual_module = getattr(getattr(model, "model", None), "visual", None) or getattr(
        model, "visual", None
    )
    if not is_vlm and visual_module is not None:
        for param in visual_module.parameters():
            param.requires_grad_(False)
        if rank == 0:
            print("Froze visual encoder parameters for text-only training")

    # CPU offload if needed
    if cpu_offload:
        # Move buffers to CPU for FSDP modules
        for v in model.buffers():
            v.data = v.data.to("cpu")
        model = model.to("cpu")

    # Initialize optimizer
    optimizer = None
    if init_optimizer:
        optimizer_cls = get_class(config["optimizer"]["name"])
        optimizer = optimizer_cls(model.parameters(), **config["optimizer"]["kwargs"])


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/unit/test_recipes_and_test_suites.py Outdated
Comment thread nemo_rl/models/generation/vllm/vllm_worker.py
Comment thread nemo_rl/models/generation/vllm/vllm_worker.py Outdated
Comment thread nemo_rl/algorithms/utils.py
@yuki-97

yuki-97 commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

/ok to test f0a9372

@zpqiu

zpqiu commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 4668dec

@zpqiu zpqiu linked an issue Apr 2, 2026 that may be closed by this pull request
@zpqiu zpqiu changed the title feat: add Qwen3.5 model support feat: add Qwen3.5 & GLM-4.7-Flash model support Apr 2, 2026
@zpqiu zpqiu requested a review from terrykong April 2, 2026 10:23
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: alexchiu <qiuzhaopeng@foxmail.com>
terrykong
terrykong previously approved these changes Apr 7, 2026
@terrykong

Copy link
Copy Markdown
Collaborator

/ok to test 51f5f11

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
@zpqiu zpqiu requested a review from a team as a code owner April 7, 2026 06:59
terrykong
terrykong previously approved these changes Apr 7, 2026
@terrykong

Copy link
Copy Markdown
Collaborator

/ok to test 557b9ca

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
@terrykong

Copy link
Copy Markdown
Collaborator

/ok to test bfaf018

terrykong
terrykong previously approved these changes Apr 7, 2026
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
@terrykong

Copy link
Copy Markdown
Collaborator

/ok to test 073467f

@terrykong terrykong merged commit 909834e into NVIDIA-NeMo:main Apr 7, 2026
27 checks passed
ZhiyuLi-Nvidia added a commit that referenced this pull request May 19, 2026
…hat pin ratio=2

PR #2325 (TypedDict→BaseModel migration) changed the default in
examples/configs/grpo_math_1B.yaml from ``truncated_importance_sampling_type: tis``
to ``null``. Four recipes from PR #2151 (Qwen3.5 + GLM-4.7-Flash) inherit
that parent, override ``truncated_importance_sampling_ratio: 2`` (committing
to truncated IS), but do not set ``truncated_importance_sampling_type``.
With the new ``null`` default they hit

  ValueError: Invalid truncated importance sampling type: None

at the first loss call (loss_functions.py:530 — the ``ratio is not None``
branch enters but no type matches "tis"/"icepop"/"seq-mask-tis").

Set the type explicitly on each affected recipe rather than restoring the
base default — base ``grpo_math_1B.yaml`` does not enable truncated IS
(``ratio: null``), so its ``type`` value is moot and should stay decoupled.

Surfaced by short nightly sweep job 11899306 (qwen3.5-35ba3b-dapo) on
commit 9cbecb8.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
ZhiyuLi-Nvidia added a commit that referenced this pull request May 19, 2026
…hat pin ratio=2

PR #2325 (TypedDict→BaseModel migration) changed the default in
examples/configs/grpo_math_1B.yaml from ``truncated_importance_sampling_type: tis``
to ``null``. Four recipes from PR #2151 (Qwen3.5 + GLM-4.7-Flash) inherit
that parent, override ``truncated_importance_sampling_ratio: 2`` (committing
to truncated IS), but do not set ``truncated_importance_sampling_type``.
With the new ``null`` default they hit

  ValueError: Invalid truncated importance sampling type: None

at the first loss call (loss_functions.py:530 — the ``ratio is not None``
branch enters but no type matches "tis"/"icepop"/"seq-mask-tis").

Set the type explicitly on each affected recipe rather than restoring the
base default — base ``grpo_math_1B.yaml`` does not enable truncated IS
(``ratio: null``), so its ``type`` value is moot and should stay decoupled.

Surfaced by short nightly sweep job 11899306 (qwen3.5-35ba3b-dapo) on
commit 9cbecb8.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
ZhiyuLi-Nvidia added a commit that referenced this pull request May 22, 2026
…hat pin ratio=2

PR #2325 (TypedDict→BaseModel migration) changed the default in
examples/configs/grpo_math_1B.yaml from ``truncated_importance_sampling_type: tis``
to ``null``. Four recipes from PR #2151 (Qwen3.5 + GLM-4.7-Flash) inherit
that parent, override ``truncated_importance_sampling_ratio: 2`` (committing
to truncated IS), but do not set ``truncated_importance_sampling_type``.
With the new ``null`` default they hit

  ValueError: Invalid truncated importance sampling type: None

at the first loss call (loss_functions.py:530 — the ``ratio is not None``
branch enters but no type matches "tis"/"icepop"/"seq-mask-tis").

Set the type explicitly on each affected recipe rather than restoring the
base default — base ``grpo_math_1B.yaml`` does not enable truncated IS
(``ratio: null``), so its ``type`` value is moot and should stay decoupled.

Surfaced by short nightly sweep job 11899306 (qwen3.5-35ba3b-dapo) on
commit 9cbecb8.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
ZhiyuLi-Nvidia added a commit that referenced this pull request May 23, 2026
…hat pin ratio=2

PR #2325 (TypedDict→BaseModel migration) changed the default in
examples/configs/grpo_math_1B.yaml from ``truncated_importance_sampling_type: tis``
to ``null``. Four recipes from PR #2151 (Qwen3.5 + GLM-4.7-Flash) inherit
that parent, override ``truncated_importance_sampling_ratio: 2`` (committing
to truncated IS), but do not set ``truncated_importance_sampling_type``.
With the new ``null`` default they hit

  ValueError: Invalid truncated importance sampling type: None

at the first loss call (loss_functions.py:530 — the ``ratio is not None``
branch enters but no type matches "tis"/"icepop"/"seq-mask-tis").

Set the type explicitly on each affected recipe rather than restoring the
base default — base ``grpo_math_1B.yaml`` does not enable truncated IS
(``ratio: null``), so its ``type`` value is moot and should stay decoupled.

Surfaced by short nightly sweep job 11899306 (qwen3.5-35ba3b-dapo) on
commit 9cbecb8.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) Documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding support for GLM-4.7-Flash Adding support for Qwen3.5-35B

7 participants