[PD] Support decode pp for PD disaggregation by ShangmingCai · Pull Request #14265 · sgl-project/sglang

ShangmingCai · 2025-12-02T02:56:26Z

Motivation

Support decode pp, but decode pp size should be equal to prefill pp size or 1

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

Signed-off-by: Shangming Cai <csmthu@gmail.com>

gemini-code-assist · 2025-12-02T02:56:29Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

ShangmingCai · 2025-12-02T06:13:26Z

/tag-and-rerun-ci

ShangmingCai · 2025-12-03T06:14:35Z

/tag-and-rerun-ci

ShangmingCai · 2025-12-03T06:35:17Z

Disaggregation tests have passed:

Failed tests are irrelevant.

Signed-off-by: Shangming Cai <csmthu@gmail.com>

nihao1997 · 2025-12-05T03:54:16Z

LTGM, but I try with this command
prefill

python -m sglang.launch_server
--model-path $model_path
--model-loader-extra-config '{"enable_multithread_load": true, "num_threads": 32}'
--served-model-name $model_name
--enable-metrics
--trust-remote-code
--host $local_ip
--port 30000
--dist-init-addr ${master}:5757
--watchdog-timeout 1800
--disaggregation-mode prefill
--disaggregation-ib-device $ib_device
--load-balance-method round_robin
--nnodes $node_num
--node-rank $node_rank
--tp-size 8
--pp-size 2
--context-length $MML
--chunked-prefill-size 16384
--max-prefill-tokens 16384
--page-size 16
--mem-fraction-static 0.80
--max-running-requests 128
--disable-custom-all-reduce
--tokenizer-worker-num 4
--disable-cuda-graph
--disable-radix-cache
--tool-call-parser kimi_k2 \

decode

python -m sglang.launch_server \
  --model-path $model_path \
  --model-loader-extra-config '{"enable_multithread_load": true, "num_threads": 32}' \
  --served-model-name $model_name \
  --enable-metrics \
  --enable-metrics-for-all-schedulers \
  --collect-tokens-histogram \
  --trust-remote-code \
  --host $local_ip \
  --port 30000 \
  --dist-init-addr ${master}:5757 \
  --watchdog-timeout 1800 \
  --disaggregation-mode decode \
  --disaggregation-ib-device $ib_device \
  --prefill-round-robin-balance \
  --decode-log-interval 10 \
  --nnodes $node_num \
  --node-rank $node_rank \
  --pp-size 2 \
  --tp-size 8 \
  --load-balance-method shortest_queue \
  --context-length $MML \
  --page-size 16 \
  --mem-fraction-static 0.92 \
  --max-running-requests $((node_num * 8 * max_bs)) \
  --tokenizer-worker-num 4  \
  --cuda-graph-max-bs $max_bs \
  --moe-dense-tp-size 1 \
  --enable-dp-lm-head \
  --tool-call-parser kimi_k2

and get errors:
[2025-12-05 03:44:56 PP0 TP0] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2693, in run_scheduler_process
scheduler.event_loop_normal_disagg_decode()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 803, in event_loop_normal_disagg_decode
self.process_batch_result(batch, result)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2100, in process_batch_result
self.process_batch_result_decode(batch, result)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_output_processor_mixin.py", line 329, in process_batch_result_decode
next_token_ids = next_token_ids.tolist()
^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'tolist'

[2025-12-05 03:44:56 PP0 TP3] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2693, in run_scheduler_process
scheduler.event_loop_normal_disagg_decode()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 803, in event_loop_normal_disagg_decode
self.process_batch_result(batch, result)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2100, in process_batch_result
self.process_batch_result_decode(batch, result)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_output_processor_mixin.py", line 329, in process_batch_result_decode
next_token_ids = next_token_ids.tolist()
^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'tolist'

same errors with PP2 DP8

Signed-off-by: Shangming Cai <csmthu@gmail.com>

maoqiuli · 2025-12-05T04:51:24Z

LTGM, but I try with this command prefill python -m sglang.launch_server --model-path $model_path --model-loader-extra-config '{"enable_multithread_load": true, "num_threads": 32}' --served-model-name $model_name --enable-metrics --trust-remote-code --host $local_ip --port 30000 --dist-init-addr ${master}:5757 --watchdog-timeout 1800 --disaggregation-mode prefill --disaggregation-ib-device $ib_device --load-balance-method round_robin --nnodes $node_num --node-rank $node_rank --tp-size 8 --pp-size 2 --context-length $MML --chunked-prefill-size 16384 --max-prefill-tokens 16384 --page-size 16 --mem-fraction-static 0.80 --max-running-requests 128 --disable-custom-all-reduce --tokenizer-worker-num 4 --disable-cuda-graph --disable-radix-cache --tool-call-parser kimi_k2 \

decode
python -m sglang.launch_server \
  --model-path $model_path \
  --model-loader-extra-config '{"enable_multithread_load": true, "num_threads": 32}' \
  --served-model-name $model_name \
  --enable-metrics \
  --enable-metrics-for-all-schedulers \
  --collect-tokens-histogram \
  --trust-remote-code \
  --host $local_ip \
  --port 30000 \
  --dist-init-addr ${master}:5757 \
  --watchdog-timeout 1800 \
  --disaggregation-mode decode \
  --disaggregation-ib-device $ib_device \
  --prefill-round-robin-balance \
  --decode-log-interval 10 \
  --nnodes $node_num \
  --node-rank $node_rank \
  --pp-size 2 \
  --tp-size 8 \
  --load-balance-method shortest_queue \
  --context-length $MML \
  --page-size 16 \
  --mem-fraction-static 0.92 \
  --max-running-requests $((node_num * 8 * max_bs)) \
  --tokenizer-worker-num 4  \
  --cuda-graph-max-bs $max_bs \
  --moe-dense-tp-size 1 \
  --enable-dp-lm-head \
  --tool-call-parser kimi_k2 
and get errors: [2025-12-05 03:44:56 PP0 TP0] Scheduler hit an exception: Traceback (most recent call last): File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2693, in run_scheduler_process scheduler.event_loop_normal_disagg_decode() File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 803, in event_loop_normal_disagg_decode self.process_batch_result(batch, result) File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2100, in process_batch_result self.process_batch_result_decode(batch, result) File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_output_processor_mixin.py", line 329, in process_batch_result_decode next_token_ids = next_token_ids.tolist() ^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'tolist'

[2025-12-05 03:44:56 PP0 TP3] Scheduler hit an exception: Traceback (most recent call last): File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2693, in run_scheduler_process scheduler.event_loop_normal_disagg_decode() File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/decode.py", line 803, in event_loop_normal_disagg_decode self.process_batch_result(batch, result) File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2100, in process_batch_result self.process_batch_result_decode(batch, result) File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_output_processor_mixin.py", line 329, in process_batch_result_decode next_token_ids = next_token_ids.tolist() ^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'tolist'

same errors with PP2 DP8

I also encountered the same problem when using the PP2TP4 decode instance

ShangmingCai · 2025-12-05T05:05:14Z

@maoqiuli @nihao1997 Sorry for causing the misunderstanding, the sceduler loop for disaggregated decode hasn't been merged into main yet. You can check this branch for a preview: openanolis#13

maoqiuli · 2025-12-05T05:34:32Z

@ShangmingCai Thank you very much! I will try it out.

Signed-off-by: Shangming Cai <csmthu@gmail.com>

[PD] Support decode pp for PD disaggregation

74ed22e

Signed-off-by: Shangming Cai <csmthu@gmail.com>

ShangmingCai requested review from ByronHsu and hnyls2002 as code owners December 2, 2025 02:56

github-actions Bot added the run-ci label Dec 2, 2025

ShangmingCai merged commit 93452a8 into main Dec 3, 2025
151 of 164 checks passed

ShangmingCai deleted the support_decode_pp branch December 3, 2025 06:35

bluecoffee8 mentioned this pull request Dec 4, 2025

Support PP x PD decode with nixl backend #14392

Merged

6 tasks

tom-jerr pushed a commit to tom-jerr/sglang that referenced this pull request Dec 4, 2025

[PD] Support decode pp for PD disaggregation (sgl-project#14265)

5ad0deb

Signed-off-by: Shangming Cai <csmthu@gmail.com>

yingluosanqian pushed a commit to yingluosanqian/sglang that referenced this pull request Dec 4, 2025

[PD] Support decode pp for PD disaggregation (sgl-project#14265)

de5dfba

Signed-off-by: Shangming Cai <csmthu@gmail.com>

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025

[PD] Support decode pp for PD disaggregation (sgl-project#14265)

0fa5898

Signed-off-by: Shangming Cai <csmthu@gmail.com>

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025

[PD] Support decode pp for PD disaggregation (sgl-project#14265)

67e5d91

Signed-off-by: Shangming Cai <csmthu@gmail.com>

yuchengz816-bot pushed a commit to yuchengz816-bot/sglang that referenced this pull request Dec 8, 2025

[PD] Support decode pp for PD disaggregation (sgl-project#14265)

126456f

Signed-off-by: Shangming Cai <csmthu@gmail.com>

Kevin-XiongC pushed a commit to novitalabs/sglang that referenced this pull request Dec 9, 2025

[PD] Support decode pp for PD disaggregation (sgl-project#14265)

ddbf291

Signed-off-by: Shangming Cai <csmthu@gmail.com>

ShangmingCai mentioned this pull request Dec 12, 2025

[Roadmap] Pipeline parallelism roadmap #11857

Open

25 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PD] Support decode pp for PD disaggregation#14265

[PD] Support decode pp for PD disaggregation#14265
ShangmingCai merged 1 commit intomainfrom
support_decode_pp

ShangmingCai commented Dec 2, 2025

Uh oh!

gemini-code-assist Bot commented Dec 2, 2025

Uh oh!

ShangmingCai commented Dec 2, 2025

Uh oh!

ShangmingCai commented Dec 3, 2025

Uh oh!

ShangmingCai commented Dec 3, 2025

Uh oh!

Uh oh!

nihao1997 commented Dec 5, 2025 •

edited

Loading

Uh oh!

maoqiuli commented Dec 5, 2025

Uh oh!

ShangmingCai commented Dec 5, 2025

Uh oh!

maoqiuli commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ShangmingCai commented Dec 2, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Dec 2, 2025

Uh oh!

ShangmingCai commented Dec 2, 2025

Uh oh!

ShangmingCai commented Dec 3, 2025

Uh oh!

ShangmingCai commented Dec 3, 2025

Uh oh!

Uh oh!

nihao1997 commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maoqiuli commented Dec 5, 2025

Uh oh!

ShangmingCai commented Dec 5, 2025

Uh oh!

maoqiuli commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nihao1997 commented Dec 5, 2025 •

edited

Loading