Refactor LoRA handling to support adapter tensors in fused format#6585
Merged
zhyncs merged 24 commits intosgl-project:mainfrom May 27, 2025
Merged
Refactor LoRA handling to support adapter tensors in fused format#6585zhyncs merged 24 commits intosgl-project:mainfrom
zhyncs merged 24 commits intosgl-project:mainfrom
Conversation
This was referenced May 23, 2025
This comment was marked as resolved.
This comment was marked as resolved.
Fridge003
requested changes
May 26, 2025
Fridge003
reviewed
May 26, 2025
Collaborator
|
Also we need a test for Phi4MM model, which can be put under |
Collaborator
Author
Sounds good. Thank you for the suggestion! Currently I added |
Layssy
pushed a commit
to Layssy/sglang-iaas
that referenced
this pull request
Jun 9, 2025
xwu-intel
pushed a commit
to xwu-intel/sglang
that referenced
this pull request
Jun 17, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
However, there are some models (e.g., Phi4MM whose weight tensors come in pre-fused format, which is not supported as-is by SGL today:
.
Modifications
map_lora_module_namethat right now only serves as a "filter" to ensure LoRAManager does not incorrectly maps LoRA weights to unwanted modules (e.g., vision towers).Example
In my local branch, I can verify that Phi4MM LoRA can be loaded successfully and observed a significant increase in MMMU score from 0.38 to 0.472.
It's worth noting that, I noticed that the MMMU is still lower than what's claimed by the author (0.55). However, it's difficult to conclude the source of the discrepancy, as I am seeing similar situation for exisitng non-LoRA models as well. The root cause could be one of the following: (1) incorrect model implementation (2) issues with the benchmark script (3) paper authors had a different benchmarking setup. I will add it to my follow-up list.
Checklist