Skip to content

Refactor LoRA handling to support adapter tensors in fused format#6585

Merged
zhyncs merged 24 commits intosgl-project:mainfrom
lifuhuang:lora-new
May 27, 2025
Merged

Refactor LoRA handling to support adapter tensors in fused format#6585
zhyncs merged 24 commits intosgl-project:mainfrom
lifuhuang:lora-new

Conversation

@lifuhuang
Copy link
Copy Markdown
Collaborator

@lifuhuang lifuhuang commented May 25, 2025

Motivation

  1. Currently SGL expects LoRA weights to come in certain formats (e.g., q_proj, k_proj, v_proj) and conducts some stacking during loading time to meet expectation of the lora backends.

However, there are some models (e.g., Phi4MM whose weight tensors come in pre-fused format, which is not supported as-is by SGL today:
image.

  1. The current LoRA impl in SGL uses layer name and operation name (e.g., "qkv_proj") to uniquely identify LoRA adapters, which works perfectly fine for conventional LLM. However, for model architectures that have more than one attention stack (e.g., VLM), we need more accurate mapping between LoRA weights and the base model modules.

Modifications

  1. Generalize the preprocessing function from stacking-only to "normalization" at both directions (stacking, splitting, replicating, etc.) based on the initial tensor shape.
  2. To limit the scope of this PR, I did not fully implement the accurate mapping, but only introduced a map_lora_module_name that right now only serves as a "filter" to ensure LoRAManager does not incorrectly maps LoRA weights to unwanted modules (e.g., vision towers).

Example

In my local branch, I can verify that Phi4MM LoRA can be loaded successfully and observed a significant increase in MMMU score from 0.38 to 0.472.

It's worth noting that, I noticed that the MMMU is still lower than what's claimed by the author (0.55). However, it's difficult to conclude the source of the discrepancy, as I am seeing similar situation for exisitng non-LoRA models as well. The root cause could be one of the following: (1) incorrect model implementation (2) issues with the benchmark script (3) paper authors had a different benchmarking setup. I will add it to my follow-up list.

image

Checklist

@Fridge003

This comment was marked as resolved.

Comment thread python/sglang/srt/lora/lora.py Outdated
Comment thread python/sglang/srt/lora/lora.py Outdated
Comment thread python/sglang/srt/lora/lora.py
Comment thread python/sglang/srt/lora/utils.py
Comment thread python/sglang/srt/lora/lora_manager.py Outdated
Comment thread python/sglang/srt/models/phi4mmvllm.py Outdated
Comment thread python/sglang/srt/lora/lora_manager.py
@lifuhuang lifuhuang requested a review from CatherineSue as a code owner May 26, 2025 04:49
@lifuhuang lifuhuang requested a review from Fridge003 May 26, 2025 06:12
Comment thread python/sglang/srt/openai_api/adapter.py Outdated
@Fridge003
Copy link
Copy Markdown
Collaborator

Also we need a test for Phi4MM model, which can be put under test/srt/models/test_vlm_models.py.
This test can be added in the future PR.

@lifuhuang
Copy link
Copy Markdown
Collaborator Author

Also we need a test for Phi4MM model, which can be put under test/srt/models/test_vlm_models.py. This test can be added in the future PR.

Sounds good. Thank you for the suggestion! Currently I added TestPhi4MMServer in test_vision_openai_server_b.py. Will take a look into test_vlm_models and add it in a follow-up PR.

@lifuhuang lifuhuang reopened this May 27, 2025
Copy link
Copy Markdown
Collaborator

@Fridge003 Fridge003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhyncs zhyncs merged commit 477a101 into sgl-project:main May 27, 2025
15 of 21 checks passed
Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025
@lifuhuang lifuhuang added lora Multi-modal multi-modal language model labels Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lora Multi-modal multi-modal language model

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants