[Qwen3.5] Support Qwen3.5 Pipeline Parallelism#19670
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses and resolves issues preventing Qwen3.5 models from utilizing Pipeline Parallelism. It introduces mechanisms to correctly manage model layers and components across different pipeline stages, ensuring that each parallel rank only initializes and processes the parts of the model relevant to its assigned segment, thereby enabling stable and functional distributed execution. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
/tag-and-rerun-ci |
There was a problem hiding this comment.
Code Review
This pull request adds support for pipeline parallelism to Qwen3.5 models. The changes correctly handle pipeline stages by using PPMissingLayer for embeddings and the final normalization layer on ranks where they are not needed. The logic for accessing weights in get_embed_and_head and set_embed_and_head is also correctly updated to be pipeline-aware.
My review includes two main points. First, a high-severity issue regarding memory efficiency: all decoder layers are currently instantiated on all pipeline ranks, which can lead to unnecessary memory consumption. I've provided a suggestion to fix this by using PPMissingLayer for inactive layers. Second, a medium-severity suggestion to refactor duplicated code in Qwen3_5ForConditionalGeneration and Qwen3_5MoeForConditionalGeneration into a common base class to improve maintainability.
6e3fe83 to
231f212
Compare
ShangmingCai
left a comment
There was a problem hiding this comment.
You can contact the author of #19254, I see some similar effort, so maybe we can converge the plan a little bit.
231f212 to
1942cc3
Compare
@ShangmingCai I couldn't contact @zhangxiaolei123456 directly, I left message in #19254. |
|
/rerun-failed-ci |
1 similar comment
|
/rerun-failed-ci |
| self.layers = make_layers( | ||
| config.num_hidden_layers, | ||
| get_layer, | ||
| prefix=f"{prefix}.layers", | ||
| ) |
There was a problem hiding this comment.
Should we change this block as well? I think maybe pass the pp size and pp rank into make_layers, we can get start_layer and end_layer without the need to call get_pp_indices separately.
There was a problem hiding this comment.
Per double check we can't use make_layers to generate start_layer, end_layer, it will make the result incorrect. The reason is we need to loop all layers, instead of starting from start_layer and set the missing layer accordingly inside make_layers. Change back.
ShangmingCai
left a comment
There was a problem hiding this comment.
Others LGTM, as long as the new test passes CI.
3f17199 to
c1e4a2d
Compare
c1e4a2d to
5a35581
Compare
|
/rerun-failed-ci |
2 similar comments
|
/rerun-failed-ci |
|
/rerun-failed-ci |
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Qwen35 PP support and its consistency check are introduced in #19670, but the test turned out to be flaky on H100 and AMD, which blocks the CI. The performance regression can not be reproduced on H20, so we need some time to dig before bringing this test back.
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Co-authored-by: luoyuan.luo <luoyuan.luo@antgroup.com>
Motivation
To close #19500
Currently Qwen3.5 PP will crash with error. With this PR it works.
Server:
gsm8k no drop.
Modifications
There are several modifications:
Accuracy Tests
gsm8k no drop.
Benchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci