[Bug] Pipeline-Paralleism bugs with chunked prefill.

### Describe the bug

When enabling chunked prefill together with pipeline parallelism, the prefill event loop appears to handle chunked-prefill logic incorrectly. In the logs below, a single prefill request (input length = 19) is processed multiple times as if it were separate chunked-prefill batches.

```
[2025-11-11 15:01:30 PP0] Prefill batch, #new-seq: 1, #new-token: 16, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0, #prealloc-req: 0, #inflight-req: 0, input throughput (token/s): 0.05,
[Get New Batch Prefill]
[2025-11-11 15:01:30 PP1] Prefill batch, #new-seq: 1, #new-token: 16, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0, #prealloc-req: 0, #inflight-req: 0, input throughput (token/s): 0.05,
[Get New Batch Prefill]
[2025-11-11 15:01:30 PP0] Prefill batch, #new-seq: 1, #new-token: 16, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0, #prealloc-req: 0, #inflight-req: 0, input throughput (token/s): 53.46,
[Get New Batch Prefill]
[2025-11-11 15:01:30 PP1] Prefill batch, #new-seq: 1, #new-token: 16, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0, #prealloc-req: 0, #inflight-req: 0, input throughput (token/s): 917.19,
[Get New Batch Prefill]
[2025-11-11 15:01:30 PP0] Prefill batch, #new-seq: 1, #new-token: 3, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0, #prealloc-req: 0, #inflight-req: 0, input throughput (token/s): 425.50,
[Get New Batch Prefill]
[2025-11-11 15:01:30 PP1] Prefill batch, #new-seq: 1, #new-token: 3, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0, #prealloc-req: 0, #inflight-req: 0, input throughput (token/s): 167.84,
[Release] req.kv_committed_len=19, req.kv_allocated_len=19
[Cache finished]: committed_kv_len=19
[2025-11-11 15:01:30] INFO:     127.0.0.1:52690 - "POST /generate HTTP/1.1" 200 OK
[2025-11-11 15:01:30 PP0] Scheduler hit an exception: Traceback (most recent call last):
  File "/host_home/common_sync/sglang/python/sglang/srt/managers/scheduler.py", line 2711, in run_scheduler_process
    scheduler.event_loop_pp_disagg_prefill()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/host_home/common_sync/sglang/python/sglang/srt/managers/scheduler_pp_mixin.py", line 347, in event_loop_pp_disagg_prefill
    self.check_memory()
  File "/host_home/common_sync/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 153, in check_memory
    raise ValueError(msg)
ValueError: token_to_kv_pool_allocator memory leak detected! self.max_total_num_tokens=1965607, available_size=1965568, evictable_size=23, protected_size=0
```

### Reproduction

Prefill

```bash
python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --trust-remote-code --disaggregation-mode prefill --pp-size 2 --disable-overlap-schedule --chunked-prefill-size 16 --disaggregation-transfer-backend nixl --host 127.0.0.1 --port 21100
```

Decode

```bash
python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --trust-remote-code --disaggregation-mode decode --tp 2 --base-gpu-id 4 --disaggregation-transfer-backend nixl --host 127.0.0.1 --port 21200
```

Router

```bash
python3 -m sglang_router.launch_router --pd-disaggregation --mini-lb --prefill http://127.0.0.1:21100 --decode http://127.0.0.1:21200 --host 127.0.0.1 --port 21000
```

Client

```bash
python -m sglang.test.send_one --port 21000
```

The bug does not happen deterministically; should try the above commands multiple times.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Pipeline-Paralleism bugs with chunked prefill. #13084

Describe the bug

Reproduction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Pipeline-Paralleism bugs with chunked prefill. #13084

Description

Describe the bug

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions