[PD] Support P/D role conversion of running sglang server by LLLL114 · Pull Request #9325 · sgl-project/sglang

LLLL114 · 2025-08-19T02:28:57Z

Motivation

parent issue: #8210

Currently, user can only adjust the P/D node ratio by killing and restarting the server.
This PR provide a http interface, which can reuse the loaded model weights and convert P/D role without killing server.

Modifications

Add a http interface in http_server.py and mini_lb.py to support pass convert req in running server.
Use the same DecodeReqToTokenPool in prefill and decode node. This allows to reuse cuda graph runner and attn_backend when convert between P and D.
Add several threading event to break event loop in scheduler, MooncakeKVManager and hicache.
release the resources of P/D when break event loop in scheduler, and complete the conversion process.

Accuracy Tests

launch 2P2D of Qwen3-1.7B with --enable-pd-convert:
default:

python sglang/benchmark/gsm8k/bench_sglang.py --port 8188 --parallel 30 --num-questions 300
100%|██| 300/300 [00:36<00:00,  8.30it/s]
Accuracy: 0.953
Invalid: 0.000
Latency: 36.419 s
Output throughput: 1024.578 token/s

use the req to convert P/D:

curl -X POST http://0.0.0.0:8188/convert_pd_role \
  -H "Content-Type: application/json" \
  -d '{"server_url": "http://0.0.0.0:30000","disable_cuda_graph": false}'
curl -X POST http://0.0.0.0:8188/convert_pd_role \
  -H "Content-Type: application/json" \
  -d '{"server_url": "http://0.0.0.0:31000","disable_cuda_graph": true,"disable_radix_cache": false}'

python sglang/benchmark/gsm8k/bench_sglang.py --port 8188 --parallel 30 --num-questions 300
100%|██| 300/300 [00:35<00:00,  8.55it/s]
Accuracy: 0.950
Invalid: 0.000
Latency: 35.368 s
Output throughput: 1040.173 token/s

Benchmarking and Profiling

Compare covert time with different size of model:

use genai_bench to compare

genai-bench benchmark --api-backend sglang \
            --api-base "http://localhost:8188" \
            --api-key "sglang" \
            --api-model-name "/root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B" \
            --model-tokenizer "/root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B" \
            --task text-to-text \
            --max-time-per-run 10 \
            --traffic-scenario "D(16,128)" \
            --max-requests-per-run 1000 \
            --server-engine "SGLang" \
            --num-concurrency 10

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

zhyncs · 2025-09-01T08:53:47Z

@ShangmingCai

ShangmingCai · 2025-09-01T09:06:46Z

@ShangmingCai

@zhyncs Sure, will start the reviewing process these days, just busy with too many things. And @hzh0425 will help me review this PR as well. When I think it is ready, I will ask Byron for another round of review.

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

kartikx · 2025-11-19T23:31:15Z

I am interested in this feature. Is there an expected time by when it will be merged in? Thanks

ShangmingCai · 2025-11-20T12:45:47Z

I am interested in this feature. Is there an expected time by when it will be merged in? Thanks

@kartikx This is a good feature, but currently it can not support PD with different TP/DP, so it can only serve limited scenes like small models with the same TP. We might consider integrating checkpoint engine to see in the future to test if we can fast redistribute the weight, but it might be no difference compared to shutting down and restarting another one with checkpoint engine.

LLLL114 added 29 commits July 7, 2025 17:07

support convert prefill to decode without restart the server

e0224c1

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

support convert decode to prefill

d18d172

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main_for_sync' into main-support-pdconversion

1fa610d

Fix reqtotokenpool memory leak

005316a

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

support reinit cuda graph

b4cac2e

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

support load PD parameters from request

1044f69

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

fix memory leak of tree cache

6820341

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

support set environ parameter

f495c0d

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

fix bug

d1e4bb7

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main_for_sync' into main-support-pdconversion

6988045

fix radix cache bugs & support set args of cuda graph

bf42fe2

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

fix del TransferEngine bug & cuda_graph_max_bs bug

734ecc7

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

clean code

b3b99d8

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

fix bug

38e1841

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

minor

d125061

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main_for_sync' into main-support-pdconversion

d1039e2

fix conflict

d999882

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main_for_sync' into main-support-pdconversion

dc85c96

release source of hicache storage completely

8bf91ce

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

code style

2534ace

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

fix

1d10a3b

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main_for_sync' into main-support-pdconversion

f3d4b64

fix

6a6611c

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

lint

d785bfc

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

reuse cuda graph when convert & no cuda memory leak

fdfb2c8

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

lint

a57e5a3

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

code

a57e3b9

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main_for_sync' into main-support-pdconversion

32c780f

fix

9b8e98b

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

LLLL114 requested a review from merrymercy as a code owner August 19, 2025 02:28

LLLL114 and others added 5 commits August 22, 2025 17:01

statu code

cf0c592

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

better idle check

4fdd3ca

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

name fix

11affd9

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main' into main-support-pdconversion

fb6b849

minor

22040e6

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

hzh0425 self-assigned this Aug 28, 2025

LLLL114 and others added 3 commits August 29, 2025 15:37

lint

8a9fb18

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main' into main-support-pdconversion

6ad404c

Merge branch 'main' into main-support-pdconversion

c223389

LLLL114 and others added 15 commits September 1, 2025 17:16

fix server args check logic

3d3cf47

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

minimize scheduler code change

8236641

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main' into main-support-pdconversion

1815045

Merge branch 'main' into main-support-pdconversion

cb79d65

support multi tokenizer mode and new mini_lb

0e10b6f

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main' into main-support-pdconversion

ef510a5

Merge branch 'main_for_sync' into main-support-pdconversion

83b949e

fix conflict as tokenizer manager refactored

b7f8f29

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main' into main-support-pdconversion

9d1cc9b

update

4bfff7f

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main' into main-support-pdconversion

5f25553

update

cd482ac

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

remove unused import

18b226e

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

update bootstrap_port setting when convert

c22481e

Signed-off-by: huanglong <huanglong@linux.alibaba.com>

Merge branch 'main' into main-support-pdconversion

a32cf79

merrymercy requested review from Fridge003 and JustinTong0323 as code owners November 29, 2025 07:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PD] Support P/D role conversion of running sglang server#9325

[PD] Support P/D role conversion of running sglang server#9325
LLLL114 wants to merge 62 commits intosgl-project:mainfrom
LLLL114:main-support-pdconversion

LLLL114 commented Aug 19, 2025 •

edited

Loading

Uh oh!

zhyncs commented Sep 1, 2025

Uh oh!

ShangmingCai commented Sep 1, 2025

Uh oh!

kartikx commented Nov 19, 2025

Uh oh!

ShangmingCai commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

LLLL114 commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

zhyncs commented Sep 1, 2025

Uh oh!

ShangmingCai commented Sep 1, 2025

Uh oh!

kartikx commented Nov 19, 2025

Uh oh!

ShangmingCai commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

LLLL114 commented Aug 19, 2025 •

edited

Loading