[main] [4/5] Qwen3.5 support: Interleaved MRoPE layout by wplf · Pull Request #4755 · NVIDIA/Megatron-LM

wplf · 2026-05-12T07:34:18Z

Qwen3.5 support series

This is part of a 5-PR series adding Qwen3.5-VL support, split for review clarity.

Main PRs (this series):

[1/5] MTP packed-seq CP+THD fix — fix(mtp): use padded cu_seqlens in MTP roll for THD with CP #4495
[2/5] FSDP DTensor Bridge checkpoint compatibility — [main] [2/5] Qwen3.5 support: FSDP DTensor Bridge checkpoint compatibility #4753
[3/5] SharedExpertMLP meta init — [main] [3/5] Qwen3.5 support: SharedExpertMLP meta init #4754
[4/5] Interleaved MRoPE layout — [main] [4/5] Qwen3.5 support: Interleaved MRoPE layout #4755 ← this PR
[5/5] Qwen3.5-VL training example — [main] [5/5] Qwen3.5 support: Qwen3.5-VL training example #4756

Dev PRs (corresponding mirrors):

Summary

Add the interleaved MRoPE layout used by Qwen3.5-VL, gated by a new config flag.

New TransformerConfig.mrope_interleaved: bool = False.
New MultimodalRotaryEmbedding(interleaved_mrope=...) argument.
New helper _apply_interleaved_mrope(freqs, mrope_section) that converts the per-channel outer-product layout (3, bs, seq_len, dim) into the interleaved single-tensor layout (bs, seq_len, dim).
GPTModel passes config.mrope_interleaved through to the embedding.

Why

MultimodalRotaryEmbedding currently produces the section-based T/H/W cycling used by Qwen2-VL. Qwen3.5-VL (and the matching HuggingFace Qwen3VLTextRotaryEmbedding.apply_interleaved_mrope) use a different layout where:

T freqs occupy stride-3 positions {0, 3, 6, ...}
H freqs occupy {1, 4, 7, ...}
W freqs occupy {2, 5, 8, ...}

Both layouts are now supported; the flag picks between them.

Risk

mrope_interleaved defaults to False, so existing Qwen2-VL / GPT users see no behavior change.

Notes

Mirror of #4750 (same patch, targeting main instead of dev).

🤖 Generated with Claude Code

copy-pr-bot · 2026-05-12T07:34:22Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

`MultimodalRotaryEmbedding` already supports the original section-based T/H/W cycling (Qwen2-VL style). Qwen3.5-VL and the HuggingFace `Qwen3VLTextRotaryEmbedding.apply_interleaved_mrope` use a different layout where H freqs occupy stride-3 positions {1,4,7,...} and W freqs occupy {2,5,8,...}, with T at {0,3,6,...}. Add a new `interleaved_mrope` flag on the embedding (default `False`, preserves existing behavior) plus a `mrope_interleaved` config field on `TransformerConfig`, and wire it through `GPTModel`. Helper `_apply_interleaved_mrope` merges the per-channel outer-product layout `(3, bs, seq_len, dim)` into the interleaved single-tensor layout `(bs, seq_len, dim)`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-Authored-By: BestJuly <19769279+BestJuly@users.noreply.github.com>

…onfig entry Match the merged dev shadow PR NVIDIA#4750 exactly: - reflow torch.stack/torch.cat in MultimodalRotaryEmbedding.forward - add 'mrope_interleaved': False to test_hybrid_moe_model config dict Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

wplf · 2026-06-04T11:08:12Z

/ok to test 6e4e781

wplf added the Run tests label May 12, 2026

wplf changed the title ~~feat(mrope): add interleaved T/H/W layout for Qwen3.5-VL~~ [main] [4/5] Qwen3.5 support: Interleaved MRoPE layout May 12, 2026

wplf mentioned this pull request May 13, 2026

[main] [follow-up] Qwen3.5 support: MoE aux loss padding_mask #4777

Open

wplf force-pushed the feat/mrope-interleaved-layout-main branch from 8d3995f to 5414abf Compare May 13, 2026 10:24

wplf marked this pull request as ready for review June 4, 2026 11:07

wplf requested review from a team as code owners June 4, 2026 11:07

copy-pr-bot Bot temporarily deployed to public June 4, 2026 11:08 Inactive

svcnvidia-nemo-ci added the complexity: low label Jun 4, 2026

copy-pr-bot Bot temporarily deployed to test June 4, 2026 11:09 Inactive

copy-pr-bot Bot temporarily deployed to public June 4, 2026 11:11 Inactive

copy-pr-bot Bot temporarily deployed to public June 4, 2026 11:12 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[main] [4/5] Qwen3.5 support: Interleaved MRoPE layout#4755

[main] [4/5] Qwen3.5 support: Interleaved MRoPE layout#4755
wplf wants to merge 2 commits into
NVIDIA:mainfrom
wplf:feat/mrope-interleaved-layout-main

wplf commented May 12, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

wplf commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wplf commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Qwen3.5 support series

Summary

Why

Risk

Notes

Uh oh!

copy-pr-bot Bot commented May 12, 2026

Uh oh!

wplf commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wplf commented May 12, 2026 •

edited

Loading