[Bug] Mooncake Disaggregation + PP: Prefill Bootstrap Timeout causes AssertionError crash in pop_bootstrapped due to Decode KV Cache Saturation

### Checklist

- [x] I searched related issues but found no solution.
- [x] The bug persists in the latest version.
- [x] Please use English.

---

### Describe the bug

In a **PD Disaggregation + Pipeline Parallel (PP)** cluster using the **Mooncake** KV transfer backend (DeepSeek-R1 model), when Decode nodes reach high KV cache and pre-allocated memory occupancy, the system experiences:

1. Prefill nodes stuck in `KVPoll.Bootstrapping` for 600s → bootstrap timeout → `KVTransferError`
2. A **crash** triggered by an `AssertionError` in `pop_bootstrapped` of `PrefillBootstrapQueue`

The root cause is a **race condition between PP ranks** when bootstrap times out:
- PP rank N (e.g., DP3/TP3) reaches its 600s timeout → marks the request as `KVPoll.Failed`
- PP rank N+1 (e.g., DP5/TP5) receives the rid in `consensus_bootstrapped_rids` from rank N, but its own local poll still returns `KVPoll.Bootstrapping` (due to clock skew / slightly different init_time)
- `pop_bootstrapped` with `rids_to_check` set skips the `if poll == KVPoll.Bootstrapping: continue` guard and hits the assertion

---

### Error Logs

```
[DP3 TP3 EP3 PP1] Some requests timed out when bootstrapping...
  If a greater mean TTFT is acceptable, you can 'export SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT=600'
[DP3 TP3 EP3 PP1] Prefill bootstrap failed for request rank=3 req.rid='28a38646...' 
  with exception KVTransferError(bootstrap_room=...): Request ... timed out after 600.0s in KVPoll.Bootstrapping

[DP5 TP5 EP5 PP1] Scheduler hit an exception:
  File "sglang/srt/managers/scheduler.py", line 2668, in run_scheduler_process
    scheduler.event_loop_pp_disagg_prefill()
  File "sglang/srt/disaggregation/prefill.py", line 230, in pop_bootstrapped
    assert poll == KVPoll.WaitingForInput or poll == KVPoll.Failed
AssertionError
```

**Decode node metrics at time of failure (KV saturation evidence):**
```
[DP0] token usage: 0.84, pre-allocated usage: 0.42
[DP3] token usage: 0.81, pre-allocated usage: 0.52
[DP6] token usage: 0.87, pre-allocated usage: 0.72  ← highest saturation
```

---

### Root Cause Analysis

**Step 1**: Decode node KV Cache becomes saturated (`token_usage ≈ 0.8+`, `pre_allocated_usage ≈ 0.5–0.7`). The Decode node cannot allocate new KV indices for incoming requests, delaying execution of `MooncakeKVReceiver.init()`.

**Step 2**: Without `init()` being called, `TransferInfo` is never sent to the Prefill Bootstrap Server. The Prefill `MooncakeKVSender.poll()` stays in `KVPoll.Bootstrapping` indefinitely.

**Step 3**: After 600s, `MooncakeKVSender.poll()` times out and returns `KVPoll.Failed`. PP rank 0 marks the request as failed and includes the rid in `consensus_bootstrapped_rids` sent to rank 1.

**Step 4**: PP rank 1 receives the rid in `rids_to_check`, but its own local `poll()` still returns `KVPoll.Bootstrapping` (init_time is slightly later). In `pop_bootstrapped`, because the rid is in `rids_to_check`, execution does **not** hit the `if poll == KVPoll.Bootstrapping: continue` guard, and reaches the assertion:

```python
# python/sglang/srt/disaggregation/prefill.py
assert poll == KVPoll.WaitingForInput or poll == KVPoll.Failed  # poll is actually KVPoll.Bootstrapping → CRASH
```

---

### Expected Behavior

- PP ranks should tolerate small state-propagation delays between each other. A `KVPoll.Bootstrapping` state on a rank that's in `rids_to_check` should be treated as "still waiting", not a crash.
- The system should degrade gracefully (abort the request with a soft error) rather than crashing the entire Scheduler process.

---

### Proposed Fix

**Defensive fix in `pop_bootstrapped`** (avoids crash while preserving correctness):

```python
# python/sglang/srt/disaggregation/prefill.py, in pop_bootstrapped()

for i, (req, poll) in enumerate(zip(self.queue, polls)):
    if rids_to_check is not None:
        if req.rid not in rids_to_check:
            continue
    
    if poll == KVPoll.Bootstrapping:
        # PP rank state has not yet propagated; skip this round safely
        # instead of asserting (which crashes the Scheduler)
        continue
    elif poll == KVPoll.Failed:
        ...
        continue

    assert poll == KVPoll.WaitingForInput or poll == KVPoll.Failed
```

**Longer-term fix**: Add backpressure on Decode pre-alloc. When `pre_allocated_usage` exceeds a threshold, stop accepting new bootstrap requests rather than silently stalling for up to 600s.

---

### Environment

- sglang commit: `e29305c120a9830538e52dac9faf3e584b675be8`
- Transfer backend: Mooncake
- Model: DeepSeek-R1-0528
- Parallelism: DP=8, TP=8, EP=8, PP=2
- `SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT=600`

---

### Related Code

- [`python/sglang/srt/disaggregation/prefill.py#L230`](https://github.com/sgl-project/sglang/blob/e29305c/python/sglang/srt/disaggregation/prefill.py#L230) — assertion crash site
- [`python/sglang/srt/disaggregation/mooncake/conn.py#L1178`](https://github.com/sgl-project/sglang/blob/e29305c/python/sglang/srt/disaggregation/mooncake/conn.py#L1178) — `MooncakeKVSender.poll()` timeout logic
- [`python/sglang/srt/managers/scheduler_pp_mixin.py#L145`](https://github.com/sgl-project/sglang/blob/e29305c/python/sglang/srt/managers/scheduler_pp_mixin.py#L145) — `event_loop_pp_disagg_prefill`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Mooncake Disaggregation + PP: Prefill Bootstrap Timeout causes AssertionError crash in pop_bootstrapped due to Decode KV Cache Saturation #20485

Checklist

Describe the bug

Error Logs

Root Cause Analysis

Expected Behavior

Proposed Fix

Environment

Related Code

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Mooncake Disaggregation + PP: Prefill Bootstrap Timeout causes AssertionError crash in pop_bootstrapped due to Decode KV Cache Saturation #20485

Description

Checklist

Describe the bug

Error Logs

Root Cause Analysis

Expected Behavior

Proposed Fix

Environment

Related Code

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions