[lora] Fix partial MoE rank loading, VL lm_head, strict loading, deepseek on-demand#21864
Merged
yushengsu-thu merged 9 commits intosgl-project:mainfrom Apr 12, 2026
Merged
Conversation
…seek on-demand - Fix partial-lora-moe-rank loading: place A-buffer components at max_rank-spaced positions and zero B-buffer beyond loaded rank - Fix lm_head lora for VL models: move should_apply_lora gate after embed_tokens/lm_head handling - Add --lora-strict-loading flag with pre-validation of adapter weight names against target modules - Keep "all" as sentinel in server_args, resolve model-aware in lora_manager
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
5 tasks
Collaborator
|
/tag-run-ci-label |
Collaborator
|
/tag-run-ci-label |
yushengsu-thu
approved these changes
Apr 12, 2026
Collaborator
|
/tag-run-ci-label |
Collaborator
|
/rerun-failed-ci |
3 similar comments
Collaborator
|
/rerun-failed-ci |
Collaborator
|
/rerun-failed-ci |
Collaborator
|
/rerun-failed-ci |
Collaborator
|
/rerun-failed-ci |
Collaborator
|
/rerun-failed-ci |
2 similar comments
Collaborator
|
/rerun-failed-ci |
Collaborator
|
/rerun-failed-ci |
pyc96
pushed a commit
to pyc96/sglang
that referenced
this pull request
Apr 14, 2026
…seek on-demand (sgl-project#21864) Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
yushengsu-thu
added a commit
that referenced
this pull request
Apr 17, 2026
…seek on-demand (#21864) Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
yhyang201
pushed a commit
to yhyang201/sglang
that referenced
this pull request
Apr 22, 2026
…seek on-demand (sgl-project#21864) Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Several LoRA loading bugs affect correctness and usability:
max_lora_rank, the A-buffer stacked components are placed at wrong offsets, causing the MoE kernel's[:max_r]/[max_r:2*max_r]slicing to read garbage.should_apply_lorapatterns skipembed_tokens/lm_headmodules because the gate fires before they are handled."all"target module sentinel is expanded too early inserver_args, before the model is loaded, preventing model-aware resolution of which modules to target.Modifications
Partial MoE rank fix (
mem_pool.py):max_rank-spaced positions instead of contiguouslora_rank * cslicing.VL model lm_head fix (
lora_manager.py,mem_pool.py):should_apply_loragate to afterembed_tokens/lm_headhandling so VL models' patterns don't skip these modules.lora_lm_head_module is not Noneguards and PP-stage fallback assertion for non-last ranks.Strict LoRA loading (
server_args.py,lora_manager.py,mem_pool.py):--lora-strict-loadingflag."all"as a sentinel inserver_argsand resolve it model-aware inlora_managerusingauto_detect_lora_target_modules.Accuracy Tests
Changes are bug fixes to weight loading — they restore correct behavior rather than changing model outputs.
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci