[lora] Fix partial MoE rank loading, VL lm_head, strict loading, deepseek on-demand by klshuster · Pull Request #21864 · sgl-project/sglang

klshuster · 2026-04-01T17:57:25Z

Motivation

Several LoRA loading bugs affect correctness and usability:

Partial MoE rank loading — when a LoRA adapter has fewer ranks than max_lora_rank, the A-buffer stacked components are placed at wrong offsets, causing the MoE kernel's [:max_r] / [max_r:2*max_r] slicing to read garbage.
VL model lm_head LoRA — vision-language models with should_apply_lora patterns skip embed_tokens/lm_head modules because the gate fires before they are handled.
Loose LoRA loading — weight name mismatches between the adapter and the model silently drop weights, making debugging difficult. No validation that loaded weights actually match the model's target modules.
DeepSeek on-demand loading — the "all" target module sentinel is expanded too early in server_args, before the model is loaded, preventing model-aware resolution of which modules to target.

Modifications

Partial MoE rank fix (mem_pool.py):

Place stacked A-buffer components at max_rank-spaced positions instead of contiguous lora_rank * c slicing.
Zero B-buffer beyond the loaded rank so the MoE kernel reads correct padding.

VL model lm_head fix (lora_manager.py, mem_pool.py):

Move should_apply_lora gate to after embed_tokens/lm_head handling so VL models' patterns don't skip these modules.
Add lora_lm_head_module is not None guards and PP-stage fallback assertion for non-last ranks.

Strict LoRA loading (server_args.py, lora_manager.py, mem_pool.py):

Add --lora-strict-loading flag.
Pre-validate all adapter weight names against target modules before GPU buffer mutation. Log matched/skipped modules, raise on mismatch when strict.
Keep "all" as a sentinel in server_args and resolve it model-aware in lora_manager using auto_detect_lora_target_modules.

Accuracy Tests

Changes are bug fixes to weight loading — they restore correct behavior rather than changing model outputs.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

…seek on-demand - Fix partial-lora-moe-rank loading: place A-buffer components at max_rank-spaced positions and zero B-buffer beyond loaded rank - Fix lm_head lora for VL models: move should_apply_lora gate after embed_tokens/lm_head handling - Add --lora-strict-loading flag with pre-validation of adapter weight names against target modules - Keep "all" as sentinel in server_args, resolve model-aware in lora_manager

gemini-code-assist · 2026-04-01T17:57:29Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…ixes-20260401

yushengsu-thu · 2026-04-12T08:54:18Z

/tag-run-ci-label

yushengsu-thu · 2026-04-12T09:01:34Z

/tag-run-ci-label

…ixes-20260401

yushengsu-thu · 2026-04-12T09:28:01Z

/tag-run-ci-label

yushengsu-thu · 2026-04-12T09:36:57Z

/rerun-failed-ci

yushengsu-thu · 2026-04-12T09:45:11Z

/rerun-failed-ci

yushengsu-thu · 2026-04-12T15:28:49Z

/rerun-failed-ci

yushengsu-thu · 2026-04-12T18:07:09Z

/rerun-failed-ci

This reverts commit 0c17772.

…ixes-20260401

yushengsu-thu · 2026-04-12T19:58:37Z

/rerun-failed-ci

yushengsu-thu · 2026-04-12T21:57:10Z

/rerun-failed-ci

yushengsu-thu · 2026-04-12T22:12:59Z

/rerun-failed-ci

yushengsu-thu · 2026-04-12T22:54:53Z

/rerun-failed-ci

…seek on-demand (sgl-project#21864) Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>

…seek on-demand (#21864) Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>

…seek on-demand (sgl-project#21864) Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>

klshuster requested review from Fridge003, Ying1123, lifuhuang and yushengsu-thu as code owners April 1, 2026 17:57

github-actions Bot added the lora label Apr 1, 2026

yushengsu-thu self-assigned this Apr 1, 2026

yushengsu-thu mentioned this pull request Apr 9, 2026

[Lora] Lora kimi support #22381

Merged

5 tasks

yushengsu-thu added 2 commits April 12, 2026 08:51

merge

58f71a6

Merge remote-tracking branch 'upstream/main' into kurt/lora-loading-f…

c924603

…ixes-20260401

github-actions Bot added the run-ci label Apr 12, 2026

fix tiny bug

3f5c768

yushengsu-thu approved these changes Apr 12, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into kurt/lora-loading-f…

22c4a75

…ixes-20260401

yushengsu-thu added the high priority label Apr 12, 2026

bug fix: add server flag

0c17772

yushengsu-thu requested review from CatherineSue and slin1237 as code owners April 12, 2026 19:05

yushengsu-thu added 3 commits April 12, 2026 19:07

Revert "bug fix: add server flag"

e025411

This reverts commit 0c17772.

bug fix: add --lora-strict-loading arg

bc814fa

Merge remote-tracking branch 'upstream/main' into kurt/lora-loading-f…

882dc6b

…ixes-20260401

yushengsu-thu merged commit f81b6df into sgl-project:main Apr 12, 2026
446 of 507 checks passed

pyc96 pushed a commit to pyc96/sglang that referenced this pull request Apr 14, 2026

[lora] Fix partial MoE rank loading, VL lm_head, strict loading, deep…

cd02348

…seek on-demand (sgl-project#21864) Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>

yushengsu-thu added a commit that referenced this pull request Apr 17, 2026

[lora] Fix partial MoE rank loading, VL lm_head, strict loading, deep…

c99f106

…seek on-demand (#21864) Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[lora] Fix partial MoE rank loading, VL lm_head, strict loading, deep…

14968ef

…seek on-demand (sgl-project#21864) Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>

Kangyan-Zhou mentioned this pull request Apr 22, 2026

fix(lora): don't assert on non-LoRA lm_head adapter weights #23433

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lora] Fix partial MoE rank loading, VL lm_head, strict loading, deepseek on-demand#21864

[lora] Fix partial MoE rank loading, VL lm_head, strict loading, deepseek on-demand#21864
yushengsu-thu merged 9 commits intosgl-project:mainfrom
klshuster:kurt/lora-loading-fixes-20260401

klshuster commented Apr 1, 2026

Uh oh!

gemini-code-assist Bot commented Apr 1, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

klshuster commented Apr 1, 2026

Motivation

Modifications

Accuracy Tests

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot commented Apr 1, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

yushengsu-thu commented Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants