Skip to content

[Bug Fix] Ensure prefill_info_table is populated before honoring disagg_prefill_dp_rank#22990

Merged
ShangmingCai merged 1 commit intosgl-project:mainfrom
ByronHsu:byron/fix-resolve-prefill-dp-rank-order
Apr 17, 2026
Merged

[Bug Fix] Ensure prefill_info_table is populated before honoring disagg_prefill_dp_rank#22990
ShangmingCai merged 1 commit intosgl-project:mainfrom
ByronHsu:byron/fix-resolve-prefill-dp-rank-order

Conversation

@ByronHsu
Copy link
Copy Markdown
Collaborator

@ByronHsu ByronHsu commented Apr 16, 2026

Motivation

If the first request to the PD engine carries disagg_prefill_dp_rank, the request fails with:

Prefill server with bootstrap_addr: {self.bootstrap_addr} is healthy before

However, if the first request does not contain disagg_prefill_dp_rank but a later request does, the subsequent requests work because the first request triggers the prefill info query and caches it.

Root cause

In DecodePreallocQueue._resolve_prefill_dp_rank, the req.disagg_prefill_dp_rank early-return is checked before self.kv_manager.prefill_info_table.get(_bootstrap_addr(req)). When the client explicitly supplies disagg_prefill_dp_rank, we short-circuit and never trigger the slow path that queries and caches the prefill info. The subsequent prefill-server health check then has no cached info to validate against, producing the "healthy before" error.

Modifications

Move prefill_info = self.kv_manager.prefill_info_table.get(_bootstrap_addr(req)) to the top of _resolve_prefill_dp_rank. If the lookup returns None, return None so the request falls through to the slow path (_ensure_prefill_info), which queries and caches the prefill info. Only after prefill info is available do we honor the client-provided req.disagg_prefill_dp_rank.

Reproduction

# Prefill
python3 -m sglang.launch_server --model-path Qwen/Qwen3-30B-A3B \
    --tp 4 --dp 4 --enable-dp-attention --disaggregation-mode prefill

# Decode
python3 -m sglang.launch_server --model-path Qwen/Qwen3-30B-A3B \
    --tp 4 --dp 4 --enable-dp-attention --disaggregation-mode decode \
    --base-gpu-id 4 --port 30010

# Load balancer
python -m sglang_router.launch_router --mini-lb --pd-disaggregation \
    --prefill http://127.0.0.1:30000 --decode http://127.0.0.1:30010 \
    --host 0.0.0.0 --port 8000

Send the very first request with disagg_prefill_dp_rank set — before the fix, this fails with the "healthy before" error; after the fix, it succeeds:

curl -X POST http://127.0.0.1:8000/generate \
    -H 'Content-Type: application/json' \
    -d '{
        "text": "Hello World How are you?",
        "sampling_params": {"max_new_tokens": 128, "temperature": 0.0},
        "stream": false,
        "routed_dp_rank": 0,
        "disagg_prefill_dp_rank": 0
    }'

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

If the first request to the PD engine carries `disagg_prefill_dp_rank`,
`_resolve_prefill_dp_rank` returns it immediately without ever populating
`kv_manager.prefill_info_table`. This causes the prefill server health
check to fail with "Prefill server with bootstrap_addr: ... is healthy
before" because the prefill info was never queried/cached.

Move the `prefill_info_table.get(...)` lookup to the top so that the
slow path runs (and caches the prefill info) on the first request, even
when the client supplies an explicit `disagg_prefill_dp_rank`.

Made-with: Cursor
@ByronHsu ByronHsu force-pushed the byron/fix-resolve-prefill-dp-rank-order branch from 407822c to 8f024b7 Compare April 16, 2026 18:06
continue
nd = self.device_pool.kv_buffer[layer_id][naive_locs[b, i].long()]
kd = self.device_pool.kv_buffer[layer_id][kernel_locs[b, i].long()]
naive_data = self.device_pool.kv_buffer[layer_id][
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix lint error on main

Copy link
Copy Markdown
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ShangmingCai ShangmingCai merged commit cf9845f into sgl-project:main Apr 17, 2026
61 of 69 checks passed
whybeyoung pushed a commit to whybeyoung/sglang that referenced this pull request Apr 17, 2026
…gg_prefill_dp_rank (sgl-project#22990)

Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
ByronHsu added a commit that referenced this pull request Apr 17, 2026
…gg_prefill_dp_rank (#22990)

Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
jmamou pushed a commit to jmamou/sglang that referenced this pull request Apr 20, 2026
…gg_prefill_dp_rank (sgl-project#22990)

Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
…gg_prefill_dp_rank (sgl-project#22990)

Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026
…gg_prefill_dp_rank (sgl-project#22990)

Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
kyx1999 pushed a commit to KMSorSMS/sglang that referenced this pull request Apr 27, 2026
…gg_prefill_dp_rank (sgl-project#22990)

Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants