[bug fix][pp] fix weight load for qwen2.5-vl#15138
[bug fix][pp] fix weight load for qwen2.5-vl#15138ShangmingCai merged 1 commit intosgl-project:mainfrom
Conversation
Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
Summary of ChangesHello @XucSh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses two critical bugs affecting the loading and functionality of Qwen2.5-vl models. It fixes an error that previously halted the weight loading process when specific parameters were absent and ensures the correct initialization of the language model head when operating in pipeline parallelism mode, thereby improving model stability and output quality. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces two fixes for weight loading in the Qwen2.5-VL model. The first fix correctly handles the loading of tied lm_head weights in a pipeline parallel setup. The second fix prevents crashes by ignoring weights from the checkpoint that are not present in the current model partition, which is a common scenario in pipeline parallelism.
My review includes one suggestion for improving the code structure. While the current fix for ignoring missing weights is effective, a more robust solution for layer-specific weights would be to refactor the pipeline parallelism layer check to a more appropriate location in the load_weights method. This would make the code cleaner and more aligned with similar model implementations in the repository.
| else: | ||
| if get_global_server_args().encoder_only: | ||
| continue | ||
| else: | ||
| raise ValueError(f"Weight {name} not found in params_dict") | ||
| continue |
There was a problem hiding this comment.
This change correctly ignores weights that are not present in the current model partition, which is necessary for pipeline parallelism. This fixes the crash for weights like lm_head.weight on ranks where it doesn't exist.
However, for layer-specific weights (e.g., model.layers.13.mlp.down_proj.weight), the root cause of the ValueError is that the pipeline parallelism layer check is misplaced. It's currently inside the stacked_params_mapping loop (lines 764-774), so it doesn't apply to weights that are not in stacked_params_mapping.
For a more robust fix and to improve code structure, consider moving the layer_id check to the beginning of the main weight loading loop (right after line 742), similar to how it's done in Qwen2ForCausalLM.load_weights. This would ensure all layer-specific weights are correctly filtered based on the pipeline rank.
Example of the improved structure:
def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]):
# ...
params_dict = dict(self.named_parameters(remove_duplicate=False))
for name, loaded_weight in weights:
layer_id = get_layer_id(name)
if (
layer_id is not None
and hasattr(self, "model")
and hasattr(self.model, "start_layer")
and (
layer_id < self.model.start_layer
or layer_id >= self.model.end_layer
)
):
continue
if "rotary_emb.inv_freq" in name:
continue
# ... rest of the loopWith that change, you would also need to remove the original layer_id check from inside the stacked_params_mapping loop. This change would make the code cleaner and more correct.
|
/tag-and-rerun-ci |
|
/rerun-failed-ci |
1 similar comment
|
/rerun-failed-ci |
|
@gty111 Can you double-check on this PR? |
I’ve verified this PR locally, and in my tests it works well with both colocate and EPD. Thanks for the fix. |
Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
Signed-off-by: Xuchun Shang <xuchun.shang@gmail.com>
Motivation
Now, There are two bug for Qwen2.5-vl
when loading weights for Qwen2.5-vl, errors ouucrs(brought by EPD feature):
File "/root/leipi/sglang/python/sglang/srt/managers/scheduler.py", line 2762, in run_scheduler_process
scheduler = Scheduler(
^^^^^^^^^^
File "/root/leipi/sglang/python/sglang/srt/managers/scheduler.py", line 323, in init
self.tp_worker = TpModelWorker(
^^^^^^^^^^^^^^
File "/root/leipi/sglang/python/sglang/srt/managers/tp_worker.py", line 248, in init
self._model_runner = ModelRunner(
^^^^^^^^^^^^
File "/root/leipi/sglang/python/sglang/srt/model_executor/model_runner.py", line 356, in init
self.initialize(min_per_gpu_memory)
File "/root/leipi/sglang/python/sglang/srt/model_executor/model_runner.py", line 432, in initialize
self.load_model()
File "/root/leipi/sglang/python/sglang/srt/model_executor/model_runner.py", line 798, in load_model
self.model = get_model(
^^^^^^^^^^
File "/root/leipi/sglang/python/sglang/srt/model_loader/init.py", line 28, in get_model
return loader.load_model(
^^^^^^^^^^^^^^^^^^
File "/root/leipi/sglang/python/sglang/srt/model_loader/loader.py", line 604, in load_model
self.load_weights_and_postprocess(
File "/root/leipi/sglang/python/sglang/srt/model_loader/loader.py", line 612, in load_weights_and_postprocess
model.load_weights(weights)
File "/root/leipi/sglang/python/sglang/srt/models/qwen2_5_vl.py", line 795, in load_weights
raise ValueError(f"Weight {name} not found in params_dict")
ValueError: Weight model.layers.13.mlp.down_proj.weight not found in params_dict
in pp mode, the lm head is not loaded correctly which leads to garbage output.
Checklist