Skip to content

[main] [5/5] Qwen3.5 support: Qwen3.5-VL training example#4756

Open
wplf wants to merge 2 commits into
NVIDIA:mainfrom
wplf:feat/qwen35-vl-example-main
Open

[main] [5/5] Qwen3.5 support: Qwen3.5-VL training example#4756
wplf wants to merge 2 commits into
NVIDIA:mainfrom
wplf:feat/qwen35-vl-example-main

Conversation

@wplf

@wplf wplf commented May 12, 2026

Copy link
Copy Markdown
Member

Qwen3.5 support series

This is part of a 5-PR series adding Qwen3.5-VL support, split for review clarity.

Main PRs (this series):

Dev PRs (corresponding mirrors):


Summary

Adds a standalone VLM training playground under examples/multimodal_dev/ with Qwen3.5-VL end-to-end.

Model-agnostic harness

  • pretrain_multimodal.py entry point and MODEL_REGISTRY so a new architecture is just a registry entry + backing module.
  • models/base.py, forward_step.py, arguments.py, data/ (mock + CORD-V2 dataset with THD pack/pad in collate).

Qwen3.5-VL

  • Full model: models/qwen35_vl/ — vision encoder, MRoPE (pre-computed for THD), decoder, factory, specs, configurations for proxy / 9B / 397B-A17B variants.
  • Run script + README, plus tests: tests/test_mrope_parity.py, test_cp_correctness.py, test_cp_support.py, test_thd_correctness.py, test_thd_e2e.py.

One-line training infra change

  • megatron/training/datasets/data_samplers.py: enable the vanilla-collate torch DataLoader path when the new arg use_vanilla_collate_fn is set (needed for CORD-V2 under BSHD). On main this gates on hybrid_context_parallel (which is the analogue of dev's dynamic_context_parallel).

Dependency

This example sets mrope_interleaved=True in its TransformerConfig and relies on the core MRoPE interleaved layout introduced in #4755. The diff here is self-contained, but the example won't run end-to-end until #4755 merges.

Risk

  • All new files under examples/multimodal_dev/.
  • data_samplers.py change is fully backwards-compatible: behavior is unchanged unless use_vanilla_collate_fn is explicitly set.

Notes

Mirror of #4751 (same content, targeting main instead of dev). The only delta vs. the dev version is the data_samplers.py collate guard, which uses args.hybrid_context_parallel here (main) instead of args.dynamic_context_parallel (dev).

🤖 Generated with Claude Code

@wplf wplf added the Run tests label May 12, 2026
@copy-pr-bot

copy-pr-bot Bot commented May 12, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Adds a standalone VLM training playground under
``examples/multimodal_dev/`` with Qwen3.5-VL end-to-end.

Highlights
- Model-agnostic entry point (``pretrain_multimodal.py``) with a
  ``MODEL_REGISTRY`` so adding a new architecture is just a registry
  entry plus a backing module.
- Qwen3.5-VL model: vision encoder, MRoPE, decoder, factory, specs,
  configurations covering proxy / 9B / 397B-A17B variants.
- Datasets: mock data and CORD-V2 VLM dataset, with THD pack/pad in the
  collate function.
- THD + CP support consolidated in ``forward_step.py`` and the model
  layer (uses MRoPE THD pre-computation and ``cu_seqlens_q_padded`` CP
  partitioning).
- Run script + README, plus tests for MRoPE parity, CP correctness, CP
  support, and THD correctness / e2e.

Also gates the torch DataLoader vanilla-collate path on the new
``use_vanilla_collate_fn`` arg (one-line change to
``megatron/training/datasets/data_samplers.py``) so CORD-V2 works under
BSHD.

Functional dependency: the new model arch sets ``mrope_interleaved=True``
in its config and relies on the core MRoPE interleaved layout introduced
in a separate PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: BestJuly <19769279+BestJuly@users.noreply.github.com>
@wplf wplf force-pushed the feat/qwen35-vl-example-main branch from 411d25f to 10b0b62 Compare May 13, 2026 10:24
…shadow PR

Adopt the merged dev [5/5] shadow PR NVIDIA#4751 (commit 58f3e67) verbatim for
examples/multimodal_dev/ — it carries newer bug fixes:
- replace data/vlm_dataset.py with data/cord_v2.py
- add tests/_helpers.py, tests/test_cp_thd_correctness.py,
  tests/test_vision_patch_merger_parity.py
- sync the remaining 23 example files to NVIDIA#4751's content

data_samplers.py is intentionally NOT changed to match NVIDIA#4751: main uses
args.hybrid_context_parallel whereas dev uses args.dynamic_context_parallel
(the arg was renamed across branches), so NVIDIA#4756's existing line is the correct
main adaptation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wplf wplf marked this pull request as ready for review June 4, 2026 11:08
@wplf wplf requested review from a team as code owners June 4, 2026 11:08
@wplf

wplf commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

/ok to test 52a151c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants