[lora] Speedup triton backend `sgemm` calls with better grid by klshuster · Pull Request #22386 · sgl-project/sglang

klshuster · 2026-04-08T22:19:09Z

Motivation

During multi-LoRA decode, each sequence gets its own segment in the Triton sgemm grid — even when many sequences share the same adapter. This means the grid scales with batch_size instead of num_adapters, launching excessive blocks and wasting GPU cycles.

This PR sorts tokens by adapter and merges per-sequence segments into per-adapter segments, so the kernel grid scales with adapter count instead.

Modifications

kernel_utils.py (new): _resolve_token_positions() Triton JIT helper — gathers/scatters through a permutation when sorted, passes through otherwise.
All four sgemm kernels (sgemm_lora_a, sgemm_lora_b, qkv_lora_b, gate_up_lora_b): added SORTED_BY_ADAPTER constexpr path with indirection via
_resolve_token_positions, plus early-exit for empty segments and excess grid blocks.
triton_backend.py: compute_sgemm_routing() builds merged per-adapter batch info using argsort + searchsorted; called during decode only. CUDA graph buffers pre-allocated in init_cuda_graph_batch_info().
test_sgemm_sorted_by_adapter.py (new): verifies numerical equivalence (bf16, atol=1e-4) between per-sequence and sorted-by-adapter paths for all four kernels, plus mixed-rank and single-adapter edge cases.

Accuracy Tests

Unit test compares original per-sequence output against sorted-by-adapter output across all kernels.

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and
Benchmark the speed.
Follow the SGLang code style guidance.

Sort tokens by adapter during decode to merge per-sequence segments into per-adapter segments. This reduces the number of kernel grid blocks and improves GPU utilization for multi-LoRA batches. Key changes: - Add _resolve_token_positions() helper for indirection in all sgemm kernels - Add SORTED_BY_ADAPTER constexpr and early-exit for empty/OOB segments - Add compute_sgemm_routing() in TritonLoRABackend to build merged batch info - Pre-allocate sgemm CUDA graph buffers in init_cuda_graph_batch_info() - Add test_sgemm_sorted_by_adapter.py verifying correctness across all kernels

gemini-code-assist · 2026-04-08T22:19:15Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Fridge003 · 2026-04-09T04:28:06Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-13T23:46:34Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-13T23:48:14Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-14T00:23:03Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-14T08:29:17Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-14T18:38:07Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-14T22:55:42Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-14T23:30:20Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-15T00:04:26Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-15T00:13:59Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-15T01:34:21Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-15T02:57:39Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-15T03:24:56Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-15T07:20:18Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-15T07:31:01Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-15T07:46:27Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-15T08:15:22Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-15T08:41:29Z

/tag-and-rerun-ci

yushengsu-thu · 2026-04-15T16:47:30Z

/tag-and-rerun-ci

…ject#22386)

klshuster requested review from Fridge003 and Ying1123 as code owners April 8, 2026 22:19

klshuster requested review from lifuhuang and yushengsu-thu as code owners April 8, 2026 22:19

github-actions Bot added the lora label Apr 8, 2026

yushengsu-thu self-assigned this Apr 8, 2026

github-actions Bot added the run-ci label Apr 9, 2026

merge

9789b0d

yushengsu-thu added the high priority label Apr 13, 2026

fix tiny bug

b0b728e

Merge branch 'main' into lora-sgemm-sorted-by-adapter

dd4ebe8

Merge branch 'main' into lora-sgemm-sorted-by-adapter

8d7a574

yushengsu-thu enabled auto-merge (squash) April 15, 2026 08:47

yushengsu-thu disabled auto-merge April 15, 2026 16:47

yushengsu-thu added high priority and removed high priority labels Apr 15, 2026

yushengsu-thu enabled auto-merge (squash) April 15, 2026 18:06

Fridge003 disabled auto-merge April 15, 2026 20:47

Fridge003 merged commit 32d9fe5 into sgl-project:main Apr 15, 2026
258 of 342 checks passed

yushengsu-thu pushed a commit that referenced this pull request Apr 17, 2026

[lora] Speedup triton backend sgemm calls with better grid (#22386)

9e4063f

jmamou pushed a commit to jmamou/sglang that referenced this pull request Apr 20, 2026

[lora] Speedup triton backend sgemm calls with better grid (sgl-pro…

911186e

…ject#22386)

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[lora] Speedup triton backend sgemm calls with better grid (sgl-pro…

d21d959

…ject#22386)

zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026

[lora] Speedup triton backend sgemm calls with better grid (sgl-pro…

bf9ebed

…ject#22386)

kyx1999 pushed a commit to KMSorSMS/sglang that referenced this pull request Apr 27, 2026

[lora] Speedup triton backend sgemm calls with better grid (sgl-pro…

8c89b57

…ject#22386)

hnyls2002 mentioned this pull request May 7, 2026

propagate pytest exit code from test __main__ entries #24487

Merged

Conversation

klshuster commented Apr 8, 2026

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Apr 8, 2026

Uh oh!

Fridge003 commented Apr 9, 2026

Uh oh!

yushengsu-thu commented Apr 13, 2026

Uh oh!

yushengsu-thu commented Apr 13, 2026

Uh oh!

yushengsu-thu commented Apr 14, 2026

Uh oh!

yushengsu-thu commented Apr 14, 2026

Uh oh!

yushengsu-thu commented Apr 14, 2026

Uh oh!

yushengsu-thu commented Apr 14, 2026

Uh oh!

yushengsu-thu commented Apr 14, 2026

Uh oh!

yushengsu-thu commented Apr 15, 2026

Uh oh!

yushengsu-thu commented Apr 15, 2026

Uh oh!

yushengsu-thu commented Apr 15, 2026

Uh oh!

yushengsu-thu commented Apr 15, 2026

Uh oh!

yushengsu-thu commented Apr 15, 2026

Uh oh!

yushengsu-thu commented Apr 15, 2026

Uh oh!

yushengsu-thu commented Apr 15, 2026

Uh oh!

yushengsu-thu commented Apr 15, 2026

Uh oh!

yushengsu-thu commented Apr 15, 2026

Uh oh!

yushengsu-thu commented Apr 15, 2026

Uh oh!

yushengsu-thu commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants