[main] [5/5] Qwen3.5 support: Qwen3.5-VL training example by wplf · Pull Request #4756 · NVIDIA/Megatron-LM

wplf · 2026-05-12T07:34:38Z

Qwen3.5 support series

This is part of a 5-PR series adding Qwen3.5-VL support, split for review clarity.

Main PRs (this series):

[1/5] MTP packed-seq CP+THD fix — fix(mtp): use padded cu_seqlens in MTP roll for THD with CP #4495
[2/5] FSDP DTensor Bridge checkpoint compatibility — [main] [2/5] Qwen3.5 support: FSDP DTensor Bridge checkpoint compatibility #4753
[3/5] SharedExpertMLP meta init — [main] [3/5] Qwen3.5 support: SharedExpertMLP meta init #4754
[4/5] Interleaved MRoPE layout — [main] [4/5] Qwen3.5 support: Interleaved MRoPE layout #4755
[5/5] Qwen3.5-VL training example — [main] [5/5] Qwen3.5 support: Qwen3.5-VL training example #4756 ← this PR

Dev PRs (corresponding mirrors):

Summary

Adds a standalone VLM training playground under examples/multimodal_dev/ with Qwen3.5-VL end-to-end.

Model-agnostic harness

pretrain_multimodal.py entry point and MODEL_REGISTRY so a new architecture is just a registry entry + backing module.
models/base.py, forward_step.py, arguments.py, data/ (mock + CORD-V2 dataset with THD pack/pad in collate).

Qwen3.5-VL

Full model: models/qwen35_vl/ — vision encoder, MRoPE (pre-computed for THD), decoder, factory, specs, configurations for proxy / 9B / 397B-A17B variants.
Run script + README, plus tests: tests/test_mrope_parity.py, test_cp_correctness.py, test_cp_support.py, test_thd_correctness.py, test_thd_e2e.py.

One-line training infra change

megatron/training/datasets/data_samplers.py: enable the vanilla-collate torch DataLoader path when the new arg use_vanilla_collate_fn is set (needed for CORD-V2 under BSHD). On main this gates on hybrid_context_parallel (which is the analogue of dev's dynamic_context_parallel).

Dependency

This example sets mrope_interleaved=True in its TransformerConfig and relies on the core MRoPE interleaved layout introduced in #4755. The diff here is self-contained, but the example won't run end-to-end until #4755 merges.

Risk

All new files under examples/multimodal_dev/.
data_samplers.py change is fully backwards-compatible: behavior is unchanged unless use_vanilla_collate_fn is explicitly set.

Notes

Mirror of #4751 (same content, targeting main instead of dev). The only delta vs. the dev version is the data_samplers.py collate guard, which uses args.hybrid_context_parallel here (main) instead of args.dynamic_context_parallel (dev).

🤖 Generated with Claude Code

copy-pr-bot · 2026-05-12T07:34:42Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Adds a standalone VLM training playground under ``examples/multimodal_dev/`` with Qwen3.5-VL end-to-end. Highlights - Model-agnostic entry point (``pretrain_multimodal.py``) with a ``MODEL_REGISTRY`` so adding a new architecture is just a registry entry plus a backing module. - Qwen3.5-VL model: vision encoder, MRoPE, decoder, factory, specs, configurations covering proxy / 9B / 397B-A17B variants. - Datasets: mock data and CORD-V2 VLM dataset, with THD pack/pad in the collate function. - THD + CP support consolidated in ``forward_step.py`` and the model layer (uses MRoPE THD pre-computation and ``cu_seqlens_q_padded`` CP partitioning). - Run script + README, plus tests for MRoPE parity, CP correctness, CP support, and THD correctness / e2e. Also gates the torch DataLoader vanilla-collate path on the new ``use_vanilla_collate_fn`` arg (one-line change to ``megatron/training/datasets/data_samplers.py``) so CORD-V2 works under BSHD. Functional dependency: the new model arch sets ``mrope_interleaved=True`` in its config and relies on the core MRoPE interleaved layout introduced in a separate PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: BestJuly <19769279+BestJuly@users.noreply.github.com>

…shadow PR Adopt the merged dev [5/5] shadow PR NVIDIA#4751 (commit 58f3e67) verbatim for examples/multimodal_dev/ — it carries newer bug fixes: - replace data/vlm_dataset.py with data/cord_v2.py - add tests/_helpers.py, tests/test_cp_thd_correctness.py, tests/test_vision_patch_merger_parity.py - sync the remaining 23 example files to NVIDIA#4751's content data_samplers.py is intentionally NOT changed to match NVIDIA#4751: main uses args.hybrid_context_parallel whereas dev uses args.dynamic_context_parallel (the arg was renamed across branches), so NVIDIA#4756's existing line is the correct main adaptation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

wplf · 2026-06-04T11:09:02Z

/ok to test 52a151c

wplf added the Run tests label May 12, 2026

wplf changed the title ~~feat(examples/multimodal_dev): add Qwen3.5-VL training example~~ [main] [5/5] Qwen3.5 support: Qwen3.5-VL training example May 12, 2026

wplf force-pushed the feat/qwen35-vl-example-main branch from 92a00c5 to 411d25f Compare May 13, 2026 03:10

wplf mentioned this pull request May 13, 2026

[main] [follow-up] Qwen3.5 support: MoE aux loss padding_mask #4777

Open

wplf force-pushed the feat/qwen35-vl-example-main branch from 411d25f to 10b0b62 Compare May 13, 2026 10:24

wplf marked this pull request as ready for review June 4, 2026 11:08

wplf requested review from a team as code owners June 4, 2026 11:08

copy-pr-bot Bot temporarily deployed to public June 4, 2026 11:09 Inactive

copy-pr-bot Bot temporarily deployed to test June 4, 2026 11:10 Inactive

svcnvidia-nemo-ci added the complexity: high label Jun 4, 2026

copy-pr-bot Bot temporarily deployed to public June 4, 2026 11:12 Inactive

copy-pr-bot Bot temporarily deployed to public June 4, 2026 11:13 Inactive

copy-pr-bot Bot temporarily deployed to public June 4, 2026 11:21 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[main] [5/5] Qwen3.5 support: Qwen3.5-VL training example#4756

[main] [5/5] Qwen3.5 support: Qwen3.5-VL training example#4756
wplf wants to merge 2 commits into
NVIDIA:mainfrom
wplf:feat/qwen35-vl-example-main

wplf commented May 12, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

wplf commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wplf commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Qwen3.5 support series

Summary

Dependency

Risk

Notes

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

wplf commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wplf commented May 12, 2026 •

edited

Loading