Skip to content

fix: torch-native LoRA for multi-adapter case#20564

Merged
Fridge003 merged 6 commits intosgl-project:mainfrom
satyamk7054:satyamk/fix-torch-native-multi-lora
Mar 26, 2026
Merged

fix: torch-native LoRA for multi-adapter case#20564
Fridge003 merged 6 commits intosgl-project:mainfrom
satyamk7054:satyamk/fix-torch-native-multi-lora

Conversation

@satyamk7054
Copy link
Copy Markdown
Contributor

@satyamk7054 satyamk7054 commented Mar 14, 2026

Motivation

Fix RuntimeError in torch_native LoRA backend when a batch contains multiple requests where some (but not all) consecutive requests share the same adapter.

Traceback (most recent call last):
  File "sglang/python/sglang/srt/managers/scheduler.py", line 3370, in run_scheduler_process
    scheduler.run_event_loop()
  File "sglang/python/sglang/srt/managers/scheduler.py", line 1241, in run_event_loop
    dispatch_event_loop(self)
  File "sglang/python/sglang/srt/managers/scheduler.py", line 3246, in dispatch_event_loop
    scheduler.event_loop_overlap()
  File "sglang/python/sglang/srt/managers/scheduler.py", line 1302, in event_loop_overlap
    batch_result = self.run_batch(batch)
  File "sglang/python/sglang/srt/managers/scheduler.py", line 2558, in run_batch
    embeddings = self.tp_worker.forward_batch_embedding(
  File "sglang/python/sglang/srt/managers/tp_worker.py", line 212, in forward_batch_embedding
    forward_batch = ForwardBatch.init_new(model_worker_batch, self.model_runner)
  File "sglang/python/sglang/srt/model_executor/forward_batch_info.py", line 578, in init_new
    model_runner.lora_manager.prepare_lora_batch(ret)
  File "sglang/python/sglang/srt/lora/lora_manager.py", line 286, in prepare_lora_batch
    self.lora_backend.prepare_lora_batch(
  File "sglang/python/sglang/srt/lora/backend/torch_backend.py", line 264, in prepare_lora_batch
    batch_info.weight_indices[:bs].copy_(weight_indices_tensor, non_blocking=True)
RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 0

prepare_lora_batch uses torch.unique_consecutive to deduplicate consecutive weight indices, but then uses batch_size instead of the unique count for the weight_indices copy and num_segments.

Modifications

Accuracy Tests

Updated unit test

python test_torch_backend.py 
<frozen importlib._bootstrap_external>:1189: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1189: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
[CI Test Method] TestTorchNativeLoRABackend.test_run_gate_up_lora
.[CI Test Method] TestTorchNativeLoRABackend.test_run_lora_a_sgemm
.[CI Test Method] TestTorchNativeLoRABackend.test_run_lora_b_sgemm
.[CI Test Method] TestTorchNativeLoRABackend.test_run_qkv_lora
.
----------------------------------------------------------------------
Ran 4 tests in 0.208s

OK

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Satyam Kumar added 3 commits March 16, 2026 23:53
…ging

Use weight_indices=[0,0,1] with batch_size=3 so unique_consecutive
merges into 2 segments, covering the num_segments != batch_size path.
output_offset had 4 slices but run_qkv_lora hardcodes num_slices=3
for lora_a, causing shape mismatch when lora_b reads the 4th slice.
@zminglei
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@claude-pr-review-bot
Copy link
Copy Markdown

🔍 SGLang Domain Expert Review

PR: fix: torch-native LoRA for multi-adapter case (#20564)

Routing

  • lora [██████████] 100% — LoRA adapters, torch/triton ops, multi-adapter serving

lora Review

Risk Level: Low

Summary: Correct and minimal fix for a real bug where torch.unique_consecutive deduplication reduces the number of segments below batch_size, causing a tensor size mismatch on weight_indices copy. The fix correctly uses the post-dedup length.

Issues Found:

  • None — the fix is correct and well-targeted.

Suggestions:

  • Consider adding a comment on num_segments = len(weight_indices_tensor) explaining it's the count after unique_consecutive dedup (e.g., # Number of unique adapter segments after consecutive dedup). Clarifies the distinction from bs for future readers.
  • The output_offset change in the QKV test (from [0, 3, 6, 9, 12] to [0, 3, 6, 9]) reduces slices from 4 to 3, aligning with Q/K/V. This is an unrelated but correct test improvement — worth mentioning in the PR description for clarity.
  • No test coverage for the CUDA graph path with multi-adapter merging (use_cuda_graph = False in tests). Low risk since the logic is symmetric, but worth noting.

Looks Good:

  • Root cause analysis is accurate: unique_consecutive produces fewer elements than batch_size when consecutive requests share an adapter, old code incorrectly assumed equality
  • Fix is minimal — only 3 lines changed in production code, all in the same method
  • Test is well-designed: weight_indices = [0, 0, 1] with batch_size=3 directly exercises the dedup path (2 segments from 3 requests)
  • The seg_lens scatter-add logic correctly aggregates sequence lengths for merged segments

Generated by SGLang domain expert review agents.

@satyamk7054
Copy link
Copy Markdown
Contributor Author

satyamk7054 commented Mar 17, 2026

/rerun-failed-ci try again 3

@zminglei
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

1 similar comment
@satyamk7054
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@Fridge003 Fridge003 merged commit e59ea4f into sgl-project:main Mar 26, 2026
136 of 146 checks passed
satyamk7054 added a commit to satyamk7054/sglang that referenced this pull request Apr 3, 2026
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
Co-authored-by: Satyam Kumar <satyamk@linkedin.com>
@satyamk7054 satyamk7054 deleted the satyamk/fix-torch-native-multi-lora branch April 25, 2026 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants