Skip to content

[lora] Fix partial MoE rank loading, VL lm_head, strict loading, deepseek on-demand#21864

Merged
yushengsu-thu merged 9 commits intosgl-project:mainfrom
klshuster:kurt/lora-loading-fixes-20260401
Apr 12, 2026
Merged

[lora] Fix partial MoE rank loading, VL lm_head, strict loading, deepseek on-demand#21864
yushengsu-thu merged 9 commits intosgl-project:mainfrom
klshuster:kurt/lora-loading-fixes-20260401

Conversation

@klshuster
Copy link
Copy Markdown
Contributor

Motivation

Several LoRA loading bugs affect correctness and usability:

  1. Partial MoE rank loading — when a LoRA adapter has fewer ranks than max_lora_rank, the A-buffer stacked components are placed at wrong offsets, causing the MoE kernel's [:max_r] / [max_r:2*max_r] slicing to read garbage.
  2. VL model lm_head LoRA — vision-language models with should_apply_lora patterns skip embed_tokens/lm_head modules because the gate fires before they are handled.
  3. Loose LoRA loading — weight name mismatches between the adapter and the model silently drop weights, making debugging difficult. No validation that loaded weights actually match the model's target modules.
  4. DeepSeek on-demand loading — the "all" target module sentinel is expanded too early in server_args, before the model is loaded, preventing model-aware resolution of which modules to target.

Modifications

Partial MoE rank fix (mem_pool.py):

  • Place stacked A-buffer components at max_rank-spaced positions instead of contiguous lora_rank * c slicing.
  • Zero B-buffer beyond the loaded rank so the MoE kernel reads correct padding.

VL model lm_head fix (lora_manager.py, mem_pool.py):

  • Move should_apply_lora gate to after embed_tokens/lm_head handling so VL models' patterns don't skip these modules.
  • Add lora_lm_head_module is not None guards and PP-stage fallback assertion for non-last ranks.

Strict LoRA loading (server_args.py, lora_manager.py, mem_pool.py):

  • Add --lora-strict-loading flag.
  • Pre-validate all adapter weight names against target modules before GPU buffer mutation. Log matched/skipped modules, raise on mismatch when strict.
  • Keep "all" as a sentinel in server_args and resolve it model-aware in lora_manager using auto_detect_lora_target_modules.

Accuracy Tests

Changes are bug fixes to weight loading — they restore correct behavior rather than changing model outputs.

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

…seek on-demand

- Fix partial-lora-moe-rank loading: place A-buffer components at
  max_rank-spaced positions and zero B-buffer beyond loaded rank
- Fix lm_head lora for VL models: move should_apply_lora gate after
  embed_tokens/lm_head handling
- Add --lora-strict-loading flag with pre-validation of adapter weight
  names against target modules
- Keep "all" as sentinel in server_args, resolve model-aware in lora_manager
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added the lora label Apr 1, 2026
@yushengsu-thu yushengsu-thu self-assigned this Apr 1, 2026
@yushengsu-thu yushengsu-thu mentioned this pull request Apr 9, 2026
5 tasks
@yushengsu-thu
Copy link
Copy Markdown
Collaborator

/tag-run-ci-label

@yushengsu-thu
Copy link
Copy Markdown
Collaborator

/tag-run-ci-label

@yushengsu-thu
Copy link
Copy Markdown
Collaborator

/tag-run-ci-label

@yushengsu-thu
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

3 similar comments
@yushengsu-thu
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yushengsu-thu
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yushengsu-thu
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yushengsu-thu
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yushengsu-thu
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

2 similar comments
@yushengsu-thu
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yushengsu-thu
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@yushengsu-thu yushengsu-thu merged commit f81b6df into sgl-project:main Apr 12, 2026
446 of 507 checks passed
pyc96 pushed a commit to pyc96/sglang that referenced this pull request Apr 14, 2026
…seek on-demand (sgl-project#21864)

Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
yushengsu-thu added a commit that referenced this pull request Apr 17, 2026
…seek on-demand (#21864)

Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
…seek on-demand (sgl-project#21864)

Co-authored-by: Yusheng Su <yushengsu.thu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants