[Feature] add LoRADrainer to address high P99 TTFT by glenliu21 · Pull Request #17913 · sgl-project/sglang

glenliu21 · 2026-01-29T04:34:00Z

Motivation

Currently, our LoRA implementation suffers from an extremely high P99 TTFT issue. For instance, running with the below scripts on an A100-SXM4-80GB:

python3 -m sglang.launch_server \
    --model-path meta-llama/Llama-3.1-8B-Instruct \
    --max-loaded-loras 6 \
    --max-loras-per-batch 3 \
    --lora-paths \
        adapter0=faridlazuarda/valadapt-llama-3.1-8B-it-chinese \
        adapter1=LlamaFactoryAI/Llama-3.1-8B-Instruct-cv-job-description-matching \
        adapter2=Nutanix/Meta-Llama-3.1-8B-Instruct_lora_4_alpha_16 \
        adapter3=pbevan11/llama-3.1-8b-ocr-correction \
        adapter4=reissbaker/llama-3.1-8b-abliterated-lora \
        adapter5=Roblox/Llama-3.1-8B-Instruct-RobloxGuard-1.0

python3 -m sglang.bench_serving \
  --backend sglang \
  --base-url http://localhost:30000 \
  --dataset-name random \
  --num-prompts 200 \
  --request-rate 4 \
  --random-input-len 512 \
  --random-output-len 512 \
  --lora-name \
    adapter0 \
    adapter1 \
    adapter2 \
    adapter3 \
    adapter4 \
    adapter5

gives us the following results:

----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   12060.28
Median E2E Latency (ms):                 7051.01
P90 E2E Latency (ms):                    32927.73
P99 E2E Latency (ms):                    45870.02
---------------Time to First Token----------------
Mean TTFT (ms):                          7910.15
Median TTFT (ms):                        83.30
P99 TTFT (ms):                           39550.49

That means that 1% of requests take almost 40 seconds to schedule, compared to the median which is 83 ms.

Modifications

Introduce a LoRADrainer class to force hot adapters to start draining for cold adapters that have been starved

Accuracy Tests

Add unit tests in test_lora_drainer.py

Benchmarking and Profiling

Metric	main	PR	% Change
Mean E2E Latency ms	12060.28	8502.00	-29.5%
Median E2E Latency ms	7051.01	8069.63
P90 E2E Latency ms	32927.73	15974.35	-51.5%
P99 E2E Latency ms	45870.02	19969.72	-56.5%
Mean TTFT ms	7910.15	4279.28	-45.9%
Median TTFT ms	83.30	3728.36
P99 TTFT ms	39550.49	12081.23	-69.5%

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-01-29T04:34:04Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

glenliu21 · 2026-01-29T13:20:22Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a LoRADrainer to address high P99 TTFT for LoRA requests by preventing adapter starvation. The implementation is well-structured, introducing a new LoRADrainer class with a draining mechanism that seems sound. The integration into the Scheduler is clean, and the addition of unit tests for the new functionality is a great practice. I have one minor suggestion to improve code robustness in the scheduler.

Fridge003 · 2026-01-31T08:10:05Z

From the benchmark, it seems that this draining strategy will harm median latency/TTFT.
Can we control this feature with a server argument, so we can turn if off when better median metric is needed

glenliu21 · 2026-02-01T21:08:41Z

From the benchmark, it seems that this draining strategy will harm median latency/TTFT. Can we control this feature with a server argument, so we can turn if off when better median metric is needed

Added - it is turned off by default.

glenliu21 · 2026-04-19T18:58:55Z

/tag-run-ci-label

glenliu21 · 2026-04-19T19:12:03Z

/rerun-failed-ci

glenliu21 · 2026-04-20T02:16:44Z

/rerun-failed-ci

glenliu21 · 2026-04-20T12:23:00Z

/rerun-failed-ci again

glenliu21 · 2026-04-29T02:42:23Z

/rerun-failed-ci again

glenliu21 · 2026-04-30T01:20:29Z

/rerun-failed-ci again

glenliu21 · 2026-05-02T03:13:10Z

/rerun-failed-ci again

glenliu21 · 2026-05-02T13:27:41Z

/rerun-failed-ci again

yushengsu-thu · 2026-05-02T18:40:08Z

@Fridge003 I think it's good to merge now

add LoRADrainer to address high P99 TTFT

3e29b99

glenliu21 requested review from Fridge003, Ying1123, hnyls2002, lifuhuang, merrymercy and xiezhq-hermann as code owners January 29, 2026 04:34

github-actions Bot added the lora label Jan 29, 2026

gemini-code-assist Bot reviewed Jan 29, 2026

View reviewed changes

Comment thread python/sglang/srt/managers/scheduler.py Outdated

glenliu21 mentioned this pull request Jan 31, 2026

[FIX] add lora draining to prevent edge case of a request never scheduling #15129

Closed

3 tasks

add server arg and end to end test

cee7f0f

github-actions Bot added the documentation Improvements or additions to documentation label Feb 1, 2026

Merge branch 'main' into lora_high_ttft

ce96622

glenliu21 requested a review from yushengsu-thu as a code owner March 21, 2026 01:58

Remove validate_disagg_tp_size method

f323ced

yushengsu-thu self-assigned this Apr 16, 2026

Merge branch 'main' into lora_high_ttft

dd7de99

github-actions Bot added the run-ci label Apr 19, 2026

glenliu21 added 2 commits April 19, 2026 15:43

Merge branch 'main' into lora_high_ttft

afb25c5

Merge branch 'main' into lora_high_ttft

ca6ddfe

glenliu21 added 2 commits April 19, 2026 22:30

fix

3400615

fix register

94ff567

Merge branch 'main' into lora_high_ttft

1c0f199

glenliu21 requested a review from wisclmy0611 as a code owner April 21, 2026 12:10

glenliu21 added 2 commits April 23, 2026 19:25

Merge branch 'main' into lora_high_ttft

43b177d

Merge branch 'main' into lora_high_ttft

4693536

yushengsu-thu approved these changes Apr 29, 2026

View reviewed changes

yushengsu-thu enabled auto-merge (squash) April 29, 2026 07:04

docs_new

b84d8f4

auto-merge was automatically disabled May 1, 2026 23:39
Head branch was pushed to by a user without write access

glenliu21 requested a review from JustinTong0323 as a code owner May 1, 2026 23:39

yushengsu-thu enabled auto-merge (squash) May 2, 2026 00:39

yushengsu-thu approved these changes May 2, 2026

View reviewed changes

Fridge003 disabled auto-merge May 2, 2026 23:13

Fridge003 merged commit 76b9c8d into sgl-project:main May 2, 2026
628 of 692 checks passed

glenliu21 deleted the lora_high_ttft branch May 2, 2026 23:22

Conversation

glenliu21 commented Jan 29, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Jan 29, 2026

Uh oh!

glenliu21 commented Jan 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Fridge003 commented Jan 31, 2026

Uh oh!

glenliu21 commented Feb 1, 2026

Uh oh!

glenliu21 commented Apr 19, 2026

Uh oh!

glenliu21 commented Apr 19, 2026

Uh oh!

glenliu21 commented Apr 20, 2026

Uh oh!

glenliu21 commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glenliu21 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glenliu21 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glenliu21 commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glenliu21 commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yushengsu-thu commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

glenliu21 commented Apr 20, 2026 •

edited

Loading

glenliu21 commented Apr 29, 2026 •

edited

Loading

glenliu21 commented Apr 30, 2026 •

edited

Loading

glenliu21 commented May 2, 2026 •

edited

Loading

glenliu21 commented May 2, 2026 •

edited

Loading