[Feature] Support all DP load balance methods for PD-Disaggregation mode

### Checklist

- [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 2. Please use English, otherwise it will be closed.

### Motivation
In the current PD-disaggregation implementation, decode server uses bootstrap_room to determine which dp rank in prefill server to communicate with, as #10174 describes. Thus, the prefill server is limited to use round-robin load balance method only. Also, shortest_queue and minimum_tokens methods are not supported in decode server too.

We are trying to let decode server known the right dp rank in prefill server first. Possible solutions are also mentioned in #10174 . Then we will try to support shortest_queue and minimum_tokens load balance methods. Potential PRs are as follows:

- [ ] Use bootstrap_server to register request prefill dp rank. Prefill server use `PUT` method to register dp rank info, and decode server use `GET` method to fetch it. (https://github.com/sgl-project/sglang/pull/14726)
- [ ] Support shortest_queue method for decode(https://github.com/sgl-project/sglang/pull/11469) and prefill  server.
- [ ] Support minimum_tokens method for decode and prefill server.
- [ ] ~~Support multi-tokenizer for all methods.~~

### Design: get prefill dp rank via bootstrap server 
I have implemented a POC version and tested its impact on performance.

<img width="2214" height="1208" alt="Image" src="https://github.com/user-attachments/assets/b2b35944-94d3-42ae-95e7-9510e5f7c432" />

In the POC version I have added a check_bootstraped logic in the for loop of pop_preallocated. And the decode requests whose bootstrap info are not found via `GET` from bootstrap server, are stopped from allocating caches and handshake with prefill server in `decode_req.kv_receiver.init()`. By doing this, TTFT is affected for those delayed decode requests.

I have tested with Qwen3-235B with ISL/OSL 1500:50, and seen mean TTFT increasing by around 200ms. Also, I haved checked that added delay is the reason for the incresement in TTFT, and no throughput impact is seen.

With my poc
```
============ Serving Benchmark Result ============
Backend:                                 sglang
Traffic request rate:                    5.0
Max request concurrency:                 400
Successful requests:                     400
Benchmark duration (s):                  84.29
Total input tokens:                      600000
Total input text tokens:                 600000
Total input vision tokens:               0
Total generated tokens:                  20000
Total generated tokens (retokenized):    19988
Request throughput (req/s):              4.75
Input token throughput (tok/s):          7118.30
Output token throughput (tok/s):         237.28
Total token throughput (tok/s):          7355.58
Concurrency:                             12.81
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   2698.45
Median E2E Latency (ms):                 2655.88
---------------Time to First Token----------------
Mean TTFT (ms):                          1747.57
Median TTFT (ms):                        1676.68
P99 TTFT (ms):                           3026.98
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          19.41
Median TPOT (ms):                        19.69
P99 TPOT (ms):                           27.31
---------------Inter-Token Latency----------------
Mean ITL (ms):                           19.40
Median ITL (ms):                         24.25
P95 ITL (ms):                            29.78
P99 ITL (ms):                            48.40
Max ITL (ms):                            146.91
==================================================
```

without my poc
```
============ Serving Benchmark Result ============
Backend:                                 sglang
Traffic request rate:                    5.0
Max request concurrency:                 400
Successful requests:                     400
Benchmark duration (s):                  84.04
Total input tokens:                      600000
Total input text tokens:                 600000
Total input vision tokens:               0
Total generated tokens:                  20000
Total generated tokens (retokenized):    19988
Request throughput (req/s):              4.76
Input token throughput (tok/s):          7139.54
Output token throughput (tok/s):         237.98
Total token throughput (tok/s):          7377.53
Concurrency:                             12.39
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   2603.27
Median E2E Latency (ms):                 2510.23
---------------Time to First Token----------------
Mean TTFT (ms):                          1595.98
Median TTFT (ms):                        1479.86
P99 TTFT (ms):                           3063.69
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          20.56
Median TPOT (ms):                        21.28
P99 TPOT (ms):                           29.45
---------------Inter-Token Latency----------------
Mean ITL (ms):                           20.56
Median ITL (ms):                         25.01
P95 ITL (ms):                            31.10
P99 ITL (ms):                            49.55
Max ITL (ms):                            97.39
==================================================
```

client
```
python -m sglang.bench_serving --backend sglang --model /models/Qwen3-235B-A22B-Instruct-2507-FP8 --pd-separated --host localhost --port 8000 --dataset-name random --dataset-path /mnt/ShareGPT_V3_unfiltered_cleaned_split.json --random-input-len 1500 --random-output-len 50 --random-range-ratio 1 --request-rate 5 --num-prompts 400 --max-concurrency 400
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support all DP load balance methods for PD-Disaggregation mode #13052

Checklist

Motivation

Design: get prefill dp rank via bootstrap server

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Support all DP load balance methods for PD-Disaggregation mode #13052

Description

Checklist

Motivation

Design: get prefill dp rank via bootstrap server

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions