[model] fix: expose word_embeddings on MTP shadow embedding for Qwen3-VL by cuichenx · Pull Request #3143 · NVIDIA-NeMo/Megatron-Bridge

cuichenx · 2026-04-03T20:07:33Z

What does this PR do?

Fix an AttributeError when using MTP with sequence parallelism in Qwen3-VL: the SP-scatter shadow embedding wrapper was missing the word_embeddings attribute that MTP postprocessing accesses.

Changelog

Forward word_embeddings from the original embedding onto the SP-scatter shadow function in Qwen3VLGPTModel, so MTP can access it during _postprocess

GitHub Actions CI

See the CI section in the Contributing doc for how to trigger the CI.
A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Single-line fix — the MTP path calls self.embedding.word_embeddings but the shadow closure didn't carry that attribute.

Summary by CodeRabbit

Bug Fixes
- Improved handling of embedding attributes when parallel processing features are enabled.

The MTP postprocessing path accesses `self.embedding.word_embeddings`, but the SP-scatter shadow function did not carry that attribute, causing an AttributeError. Forward the original embedding's `word_embeddings` onto the wrapper so MTP works correctly with sequence parallelism. Signed-off-by: Chen Cui <chcui@nvidia.com>

coderabbitai · 2026-04-03T20:11:52Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3a072367-0360-44a3-9073-7f7196c2b5dc

📥 Commits

Reviewing files that changed from the base of the PR and between d442550 and ede5e01.

📒 Files selected for processing (1)

src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/text_model.py

📝 Walkthrough

Walkthrough

A single line is added to the Qwen3VLGPTModel.forward method to preserve the word_embeddings attribute reference on a wrapped embedding object when sequence parallelism is enabled during multi-token prediction processing.

Changes

Cohort / File(s)	Summary
Embedding Attribute Preservation `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/text_model.py`	Assigns `word_embeddings` reference from original embedding to scatter embedding wrapper to maintain attribute access in sequence parallel mode.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: exposing word_embeddings on the MTP shadow embedding for Qwen3-VL, which directly addresses the issue described in the PR objectives.
Test Results For Major Changes	✅ Passed	Single-line bug fix to expose word_embeddings attribute on shadow embedding wrapper for MTP with sequence parallelism. Qualifies as minor change requiring no test documentation per check criteria.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chcui/fix-qwen3vl-mtp-word-embeddings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

copy-pr-bot Bot temporarily deployed to test April 3, 2026 20:08 Inactive

cuichenx linked an issue Apr 3, 2026 that may be closed by this pull request

[bug] Qwen3.5 dense small models (2B/4B) crash due to shadow embedding breaking tied embeddings #3112

Closed

yaoyu-33 merged commit 20749b0 into main Apr 5, 2026
46 of 47 checks passed

yaoyu-33 deleted the chcui/fix-qwen3vl-mtp-word-embeddings branch April 5, 2026 02:44

Adiactive mentioned this pull request Jun 2, 2026

feat(megatron): Qwen3.5 dense + MoE training/inference support via megatron-bridge areal-project/AReaL#1384

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model] fix: expose word_embeddings on MTP shadow embedding for Qwen3-VL#3143

[model] fix: expose word_embeddings on MTP shadow embedding for Qwen3-VL#3143
yaoyu-33 merged 1 commit into
mainfrom
chcui/fix-qwen3vl-mtp-word-embeddings

cuichenx commented Apr 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 3, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cuichenx commented Apr 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 3, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cuichenx commented Apr 3, 2026 •

edited by coderabbitai Bot

Loading