Skip to content

[model] fix: expose word_embeddings on MTP shadow embedding for Qwen3-VL#3143

Merged
yaoyu-33 merged 1 commit into
mainfrom
chcui/fix-qwen3vl-mtp-word-embeddings
Apr 5, 2026
Merged

[model] fix: expose word_embeddings on MTP shadow embedding for Qwen3-VL#3143
yaoyu-33 merged 1 commit into
mainfrom
chcui/fix-qwen3vl-mtp-word-embeddings

Conversation

@cuichenx

@cuichenx cuichenx commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Fix an AttributeError when using MTP with sequence parallelism in Qwen3-VL: the SP-scatter shadow embedding wrapper was missing the word_embeddings attribute that MTP postprocessing accesses.

Changelog

  • Forward word_embeddings from the original embedding onto the SP-scatter shadow function in Qwen3VLGPTModel, so MTP can access it during _postprocess

GitHub Actions CI

See the CI section in the Contributing doc for how to trigger the CI.
A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Single-line fix — the MTP path calls self.embedding.word_embeddings but the shadow closure didn't carry that attribute.

Summary by CodeRabbit

  • Bug Fixes
    • Improved handling of embedding attributes when parallel processing features are enabled.

The MTP postprocessing path accesses `self.embedding.word_embeddings`,
but the SP-scatter shadow function did not carry that attribute, causing
an AttributeError. Forward the original embedding's `word_embeddings`
onto the wrapper so MTP works correctly with sequence parallelism.

Signed-off-by: Chen Cui <chcui@nvidia.com>
@coderabbitai

coderabbitai Bot commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3a072367-0360-44a3-9073-7f7196c2b5dc

📥 Commits

Reviewing files that changed from the base of the PR and between d442550 and ede5e01.

📒 Files selected for processing (1)
  • src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/text_model.py

📝 Walkthrough

Walkthrough

A single line is added to the Qwen3VLGPTModel.forward method to preserve the word_embeddings attribute reference on a wrapped embedding object when sequence parallelism is enabled during multi-token prediction processing.

Changes

Cohort / File(s) Summary
Embedding Attribute Preservation
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/text_model.py
Assigns word_embeddings reference from original embedding to scatter embedding wrapper to maintain attribute access in sequence parallel mode.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: exposing word_embeddings on the MTP shadow embedding for Qwen3-VL, which directly addresses the issue described in the PR objectives.
Test Results For Major Changes ✅ Passed Single-line bug fix to expose word_embeddings attribute on shadow embedding wrapper for MTP with sequence parallelism. Qualifies as minor change requiring no test documentation per check criteria.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chcui/fix-qwen3vl-mtp-word-embeddings

Comment @coderabbitai help to get the list of available commands and usage tips.

@yaoyu-33 yaoyu-33 merged commit 20749b0 into main Apr 5, 2026
46 of 47 checks passed
@yaoyu-33 yaoyu-33 deleted the chcui/fix-qwen3vl-mtp-word-embeddings branch April 5, 2026 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] Qwen3.5 dense small models (2B/4B) crash due to shadow embedding breaking tied embeddings

2 participants