Skip to content

[PD] Support P/D role conversion of running sglang server#9325

Open
LLLL114 wants to merge 62 commits intosgl-project:mainfrom
LLLL114:main-support-pdconversion
Open

[PD] Support P/D role conversion of running sglang server#9325
LLLL114 wants to merge 62 commits intosgl-project:mainfrom
LLLL114:main-support-pdconversion

Conversation

@LLLL114
Copy link
Copy Markdown
Contributor

@LLLL114 LLLL114 commented Aug 19, 2025

Motivation

parent issue: #8210

Currently, user can only adjust the P/D node ratio by killing and restarting the server.
This PR provide a http interface, which can reuse the loaded model weights and convert P/D role without killing server.

Modifications

  1. Add a http interface in http_server.py and mini_lb.py to support pass convert req in running server.
  2. Use the same DecodeReqToTokenPool in prefill and decode node. This allows to reuse cuda graph runner and attn_backend when convert between P and D.
  3. Add several threading event to break event loop in scheduler, MooncakeKVManager and hicache.
  4. release the resources of P/D when break event loop in scheduler, and complete the conversion process.

Accuracy Tests

launch 2P2D of Qwen3-1.7B with --enable-pd-convert:
default:

python sglang/benchmark/gsm8k/bench_sglang.py --port 8188 --parallel 30 --num-questions 300
100%|██| 300/300 [00:36<00:00,  8.30it/s]
Accuracy: 0.953
Invalid: 0.000
Latency: 36.419 s
Output throughput: 1024.578 token/s

use the req to convert P/D:

curl -X POST http://0.0.0.0:8188/convert_pd_role \
  -H "Content-Type: application/json" \
  -d '{"server_url": "http://0.0.0.0:30000","disable_cuda_graph": false}'
curl -X POST http://0.0.0.0:8188/convert_pd_role \
  -H "Content-Type: application/json" \
  -d '{"server_url": "http://0.0.0.0:31000","disable_cuda_graph": true,"disable_radix_cache": false}'

python sglang/benchmark/gsm8k/bench_sglang.py --port 8188 --parallel 30 --num-questions 300
100%|██| 300/300 [00:35<00:00,  8.55it/s]
Accuracy: 0.950
Invalid: 0.000
Latency: 35.368 s
Output throughput: 1040.173 token/s

Benchmarking and Profiling

  1. Compare covert time with different size of model:
截屏2025-08-19 10 23 57
  1. use genai_bench to compare
genai-bench benchmark --api-backend sglang \
            --api-base "http://localhost:8188" \
            --api-key "sglang" \
            --api-model-name "/root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B" \
            --model-tokenizer "/root/.cache/modelscope/hub/models/Qwen/Qwen3-0.6B" \
            --task text-to-text \
            --max-time-per-run 10 \
            --traffic-scenario "D(16,128)" \
            --max-requests-per-run 1000 \
            --server-engine "SGLang" \
            --num-concurrency 10 
截屏2025-08-19 10 26 55

Checklist

LLLL114 added 29 commits July 7, 2025 17:07
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
@LLLL114 LLLL114 requested a review from merrymercy as a code owner August 19, 2025 02:28
LLLL114 and others added 5 commits August 22, 2025 17:01
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
@hzh0425 hzh0425 self-assigned this Aug 28, 2025
@zhyncs
Copy link
Copy Markdown
Collaborator

zhyncs commented Sep 1, 2025

@ShangmingCai

@ShangmingCai
Copy link
Copy Markdown
Collaborator

@ShangmingCai

@zhyncs Sure, will start the reviewing process these days, just busy with too many things. And @hzh0425 will help me review this PR as well. When I think it is ready, I will ask Byron for another round of review.

LLLL114 and others added 15 commits September 1, 2025 17:16
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
Signed-off-by: huanglong <huanglong@linux.alibaba.com>
@kartikx
Copy link
Copy Markdown
Contributor

kartikx commented Nov 19, 2025

I am interested in this feature. Is there an expected time by when it will be merged in? Thanks

@ShangmingCai
Copy link
Copy Markdown
Collaborator

I am interested in this feature. Is there an expected time by when it will be merged in? Thanks

@kartikx This is a good feature, but currently it can not support PD with different TP/DP, so it can only serve limited scenes like small models with the same TP. We might consider integrating checkpoint engine to see in the future to test if we can fast redistribute the weight, but it might be no difference compared to shutting down and restarting another one with checkpoint engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants