fix: torch-native LoRA for multi-adapter case by satyamk7054 · Pull Request #20564 · sgl-project/sglang

satyamk7054 · 2026-03-14T03:59:11Z

Motivation

Fix RuntimeError in torch_native LoRA backend when a batch contains multiple requests where some (but not all) consecutive requests share the same adapter.

Traceback (most recent call last):
  File "sglang/python/sglang/srt/managers/scheduler.py", line 3370, in run_scheduler_process
    scheduler.run_event_loop()
  File "sglang/python/sglang/srt/managers/scheduler.py", line 1241, in run_event_loop
    dispatch_event_loop(self)
  File "sglang/python/sglang/srt/managers/scheduler.py", line 3246, in dispatch_event_loop
    scheduler.event_loop_overlap()
  File "sglang/python/sglang/srt/managers/scheduler.py", line 1302, in event_loop_overlap
    batch_result = self.run_batch(batch)
  File "sglang/python/sglang/srt/managers/scheduler.py", line 2558, in run_batch
    embeddings = self.tp_worker.forward_batch_embedding(
  File "sglang/python/sglang/srt/managers/tp_worker.py", line 212, in forward_batch_embedding
    forward_batch = ForwardBatch.init_new(model_worker_batch, self.model_runner)
  File "sglang/python/sglang/srt/model_executor/forward_batch_info.py", line 578, in init_new
    model_runner.lora_manager.prepare_lora_batch(ret)
  File "sglang/python/sglang/srt/lora/lora_manager.py", line 286, in prepare_lora_batch
    self.lora_backend.prepare_lora_batch(
  File "sglang/python/sglang/srt/lora/backend/torch_backend.py", line 264, in prepare_lora_batch
    batch_info.weight_indices[:bs].copy_(weight_indices_tensor, non_blocking=True)
RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 0

prepare_lora_batch uses torch.unique_consecutive to deduplicate consecutive weight indices, but then uses batch_size instead of the unique count for the weight_indices copy and num_segments.

Modifications

Accuracy Tests

Updated unit test

python test_torch_backend.py 
<frozen importlib._bootstrap_external>:1189: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1189: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
[CI Test Method] TestTorchNativeLoRABackend.test_run_gate_up_lora
.[CI Test Method] TestTorchNativeLoRABackend.test_run_lora_a_sgemm
.[CI Test Method] TestTorchNativeLoRABackend.test_run_lora_b_sgemm
.[CI Test Method] TestTorchNativeLoRABackend.test_run_qkv_lora
.
----------------------------------------------------------------------
Ran 4 tests in 0.208s

OK

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

…pters

gemini-code-assist · 2026-03-14T03:59:15Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…ging Use weight_indices=[0,0,1] with batch_size=3 so unique_consecutive merges into 2 segments, covering the num_segments != batch_size path.

output_offset had 4 slices but run_qkv_lora hardcodes num_slices=3 for lora_a, causing shape mismatch when lora_b reads the 4th slice.

zminglei · 2026-03-17T00:05:08Z

/tag-and-rerun-ci

claude-pr-review-bot · 2026-03-17T00:09:44Z

🔍 SGLang Domain Expert Review

PR: fix: torch-native LoRA for multi-adapter case (#20564)

Routing

lora [██████████] 100% — LoRA adapters, torch/triton ops, multi-adapter serving

lora Review

Risk Level: Low

Summary: Correct and minimal fix for a real bug where torch.unique_consecutive deduplication reduces the number of segments below batch_size, causing a tensor size mismatch on weight_indices copy. The fix correctly uses the post-dedup length.

Issues Found:

None — the fix is correct and well-targeted.

Suggestions:

Consider adding a comment on num_segments = len(weight_indices_tensor) explaining it's the count after unique_consecutive dedup (e.g., # Number of unique adapter segments after consecutive dedup). Clarifies the distinction from bs for future readers.
The output_offset change in the QKV test (from [0, 3, 6, 9, 12] to [0, 3, 6, 9]) reduces slices from 4 to 3, aligning with Q/K/V. This is an unrelated but correct test improvement — worth mentioning in the PR description for clarity.
No test coverage for the CUDA graph path with multi-adapter merging (use_cuda_graph = False in tests). Low risk since the logic is symmetric, but worth noting.

Looks Good:

Root cause analysis is accurate: unique_consecutive produces fewer elements than batch_size when consecutive requests share an adapter, old code incorrectly assumed equality
Fix is minimal — only 3 lines changed in production code, all in the same method
Test is well-designed: weight_indices = [0, 0, 1] with batch_size=3 directly exercises the dedup path (2 segments from 3 requests)
The seg_lens scatter-add logic correctly aggregates sequence lengths for merged segments

Generated by SGLang domain expert review agents.

satyamk7054 · 2026-03-17T17:07:09Z

/rerun-failed-ci try again 3

zminglei · 2026-03-24T23:39:19Z

/rerun-failed-ci

satyamk7054 · 2026-03-25T19:01:58Z

/rerun-failed-ci

Co-authored-by: Satyam Kumar <satyamk@linkedin.com>

fix: torch-native LoRA weight_indices size mismatch with multiple ada…

af099bd

…pters

satyamk7054 requested review from Fridge003, Ying1123, lifuhuang and yushengsu-thu as code owners March 14, 2026 03:59

Satyam Kumar added 3 commits March 16, 2026 23:53

test: update torch backend test to exercise multi-adapter segment mer…

6c4a934

…ging Use weight_indices=[0,0,1] with batch_size=3 so unique_consecutive merges into 2 segments, covering the num_segments != batch_size path.

fix: correct QKV test output_offset to use 3 slices matching backend

6c30007

output_offset had 4 slices but run_qkv_lora hardcodes num_slices=3 for lora_a, causing shape mismatch when lora_b reads the 4th slice.

fix: align input_ids and seq_lens_sum with seq_lens=[1,1,1] in test

b68e4a2

github-actions Bot added the run-ci label Mar 17, 2026

Merge branch 'main' into satyamk/fix-torch-native-multi-lora

2fc0bc6

Merge branch 'main' into satyamk/fix-torch-native-multi-lora

495d2c0

zminglei approved these changes Mar 24, 2026

View reviewed changes

Fridge003 approved these changes Mar 26, 2026

View reviewed changes

Fridge003 merged commit e59ea4f into sgl-project:main Mar 26, 2026
136 of 146 checks passed

satyamk7054 added a commit to satyamk7054/sglang that referenced this pull request Apr 3, 2026

fix: torch-native LoRA for multi-adapter case (sgl-project#20564)

86a4663

Co-authored-by: Satyam Kumar <satyamk@linkedin.com>

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

fix: torch-native LoRA for multi-adapter case (sgl-project#20564)

9e3480e

Co-authored-by: Satyam Kumar <satyamk@linkedin.com>

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

fix: torch-native LoRA for multi-adapter case (sgl-project#20564)

08d2e9a

Co-authored-by: Satyam Kumar <satyamk@linkedin.com>

satyamk7054 deleted the satyamk/fix-torch-native-multi-lora branch April 25, 2026 01:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: torch-native LoRA for multi-adapter case#20564

fix: torch-native LoRA for multi-adapter case#20564
Fridge003 merged 6 commits intosgl-project:mainfrom
satyamk7054:satyamk/fix-torch-native-multi-lora

satyamk7054 commented Mar 14, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Mar 14, 2026

Uh oh!

zminglei commented Mar 17, 2026

Uh oh!

claude-pr-review-bot commented Mar 17, 2026

Uh oh!

satyamk7054 commented Mar 17, 2026 •

edited

Loading

Uh oh!

zminglei commented Mar 24, 2026

Uh oh!

satyamk7054 commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

satyamk7054 commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Mar 14, 2026

Uh oh!

zminglei commented Mar 17, 2026

Uh oh!

claude-pr-review-bot commented Mar 17, 2026

🔍 SGLang Domain Expert Review

Routing

lora Review

Uh oh!

satyamk7054 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zminglei commented Mar 24, 2026

Uh oh!

satyamk7054 commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

satyamk7054 commented Mar 14, 2026 •

edited

Loading

satyamk7054 commented Mar 17, 2026 •

edited

Loading