[Bug Fix] Ensure prefill_info_table is populated before honoring disagg_prefill_dp_rank#22990
Merged
ShangmingCai merged 1 commit intosgl-project:mainfrom Apr 17, 2026
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
If the first request to the PD engine carries `disagg_prefill_dp_rank`, `_resolve_prefill_dp_rank` returns it immediately without ever populating `kv_manager.prefill_info_table`. This causes the prefill server health check to fail with "Prefill server with bootstrap_addr: ... is healthy before" because the prefill info was never queried/cached. Move the `prefill_info_table.get(...)` lookup to the top so that the slow path runs (and caches the prefill info) on the first request, even when the client supplies an explicit `disagg_prefill_dp_rank`. Made-with: Cursor
407822c to
8f024b7
Compare
ByronHsu
commented
Apr 16, 2026
| continue | ||
| nd = self.device_pool.kv_buffer[layer_id][naive_locs[b, i].long()] | ||
| kd = self.device_pool.kv_buffer[layer_id][kernel_locs[b, i].long()] | ||
| naive_data = self.device_pool.kv_buffer[layer_id][ |
Collaborator
Author
There was a problem hiding this comment.
fix lint error on main
whybeyoung
pushed a commit
to whybeyoung/sglang
that referenced
this pull request
Apr 17, 2026
…gg_prefill_dp_rank (sgl-project#22990) Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
ByronHsu
added a commit
that referenced
this pull request
Apr 17, 2026
…gg_prefill_dp_rank (#22990) Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
jmamou
pushed a commit
to jmamou/sglang
that referenced
this pull request
Apr 20, 2026
…gg_prefill_dp_rank (sgl-project#22990) Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
yhyang201
pushed a commit
to yhyang201/sglang
that referenced
this pull request
Apr 22, 2026
…gg_prefill_dp_rank (sgl-project#22990) Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
zhangying098
pushed a commit
to zhangying098/sglang
that referenced
this pull request
Apr 23, 2026
…gg_prefill_dp_rank (sgl-project#22990) Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
kyx1999
pushed a commit
to KMSorSMS/sglang
that referenced
this pull request
Apr 27, 2026
…gg_prefill_dp_rank (sgl-project#22990) Co-authored-by: Byron Hsu <byron+per@periodiclabs.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
If the first request to the PD engine carries
disagg_prefill_dp_rank, the request fails with:However, if the first request does not contain
disagg_prefill_dp_rankbut a later request does, the subsequent requests work because the first request triggers the prefill info query and caches it.Root cause
In
DecodePreallocQueue._resolve_prefill_dp_rank, thereq.disagg_prefill_dp_rankearly-return is checked beforeself.kv_manager.prefill_info_table.get(_bootstrap_addr(req)). When the client explicitly suppliesdisagg_prefill_dp_rank, we short-circuit and never trigger the slow path that queries and caches the prefill info. The subsequent prefill-server health check then has no cached info to validate against, producing the "healthy before" error.Modifications
Move
prefill_info = self.kv_manager.prefill_info_table.get(_bootstrap_addr(req))to the top of_resolve_prefill_dp_rank. If the lookup returnsNone, returnNoneso the request falls through to the slow path (_ensure_prefill_info), which queries and caches the prefill info. Only after prefill info is available do we honor the client-providedreq.disagg_prefill_dp_rank.Reproduction
Send the very first request with
disagg_prefill_dp_rankset — before the fix, this fails with the "healthy before" error; after the fix, it succeeds:curl -X POST http://127.0.0.1:8000/generate \ -H 'Content-Type: application/json' \ -d '{ "text": "Hello World How are you?", "sampling_params": {"max_new_tokens": 128, "temperature": 0.0}, "stream": false, "routed_dp_rank": 0, "disagg_prefill_dp_rank": 0 }'Checklist