[Disagg] Non-blocking `try_ensure_parallel_info` in pending queue, consolidate rank mapping into `PrefillServerInfo` by ShangmingCai · Pull Request #20785 · sgl-project/sglang

ShangmingCai · 2026-03-17T13:23:58Z

Summary

add() becomes a pure cache lookup — no network calls, no blocking
try_ensure_parallel_info() is only called in _resolve_pending_reqs() during the scheduling cycle, with retry tracking and abort after max attempts
TP/CP/PP rank mapping computed once per bootstrap addr in CommonKVManager._resolve_rank_mapping() and stored on PrefillServerInfo, instead of recomputed per-receiver

Background

add() previously called ensure_parallel_info() which blocks up to 30s (5 retries × 1s sleep + 5s HTTP timeout)
Rank mapping was duplicated: computed in both ensure_parallel_info path and CommonKVReceiver.__init__(), with identical results

Changes

Non-blocking ensure (conn.py)

Replace blocking ensure_parallel_info() (retry loop + time.sleep) with try_ensure_parallel_info() — single attempt, no sleep

Pending queue as the single ensure site (decode.py)

add() fast path: cache-only _resolve_prefill_dp_rank() → enqueue or append to pending_reqs
_resolve_pending_reqs() is the only place that calls try_ensure_parallel_info(), split into two passes:
1. _ensure_prefill_info(): per-addr non-blocking fetch with retry counter, abort after 30 cycles
2. dp rank resolution: local resolve → batch query_prefill_dp_ranks fallback

Unified rank mapping (conn.py)

Add _resolve_rank_mapping() on CommonKVManager — computes TP/CP/PP mapping once and stores on PrefillServerInfo
CommonKVReceiver.__init__() now reads pre-computed mapping instead of recomputing

Signed-off-by: Shangming Cai <csmthu@gmail.com>

ShangmingCai · 2026-03-17T13:24:12Z

/rerun-stage stage-c-test-8-gpu-h20

gemini-code-assist · 2026-03-17T13:24:16Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the handling of parallel information and data parallel (DP) rank resolution within the disaggregation system. The changes aim to remove redundant checks and centralize the process of determining DP ranks for pending requests, leading to a cleaner and potentially more efficient request processing flow.

Highlights

Redundant ensure_parallel_info removal: The ensure_parallel_info function call has been removed, streamlining the check for prefill parallel information by directly querying the prefill_info_table.
DP rank resolution optimization: The logic for resolving data parallel (DP) ranks has been centralized within _resolve_pending_reqs to prevent redundant calls to _resolve_prefill_dp_rank and improve efficiency.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

python/sglang/srt/disaggregation/common/conn.py
- Replaced the ensure_parallel_info function call with a direct check against prefill_info_table for existing bootstrap addresses.
- Updated the failure message to reflect the new check logic.
python/sglang/srt/disaggregation/decode.py
- Modified the add method to immediately enqueue "fake transfer" requests with a DP rank of 0, bypassing further resolution.
- Removed the _is_fake_transfer check from _resolve_prefill_dp_rank, as fake transfers are now handled earlier.
- Updated _resolve_pending_reqs to use room_to_rank.get() for safer rank retrieval and ensure None checks when resolving pending requests.

Activity

No human activity has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-03-17T13:24:41Z

✅ Triggered stage-c-test-8-gpu-h20 to run independently (skipping dependencies).

github-actions · 2026-03-17T13:24:47Z

🔗 View workflow run

gemini-code-assist

Code Review

This pull request refactors the logic for resolving data parallelism (DP) ranks and ensuring parallel information in the disaggregated prefill/decode system. The changes centralize the DP rank resolution logic into the _resolve_pending_reqs method to avoid redundant calls, and move the call to ensure_parallel_info to an earlier stage. These changes aim to improve code clarity and maintainability. The implementation appears correct and aligns with the stated goals.

Signed-off-by: Shangming Cai <csmthu@gmail.com>

…allel_info

add() now does cache-only lookup (fast path). All network I/O (try_ensure_parallel_info + query_prefill_dp_ranks) happens in _resolve_pending_reqs() during the scheduling cycle, with retry tracking and abort after max attempts.

hnyls2002 · 2026-03-17T23:12:14Z

                              add(req)
                                  │
                  ┌───────────────┴───────────────┐
                  │ fake transfer?                │ cache miss
                  │ prefill info cached           │
                  │ + dp_rank known?              │
                  ▼                               ▼
            fast path                      pending_reqs ◄───────────────┐
            (cache hit)                         │                       │
                  │                             │ every scheduling      │
                  │                             │ cycle                 │
                  │                             ▼                       │
                  │        ┌──── Pass 1: ensure prefill info ─────┐     │
                  │        │       (per bootstrap_addr)           │     │
                  │        │                                      │     │
                  │        ▼                                      ▼     │
                  │   try_ensure_parallel_info         fetch failed     │
                  │   ┌──────────────┐               ┌──────────────┐   │
                  │   │ cached?      │── yes ─────►  │ count++      │   │
                  │   │              │               │              │   │
                  │   │ fetch        │── ok ───────► │ rank mapping │   │
                  │   │              │               │ + ready      │   │
                  │   │              │               │              │   │
                  │   │              │               │ < 30 ?       │   │
                  │   │              │               │   yes ───────┼──► remaining
                  │   │              │               │   no  ───────┼──► ABORT
                  │   └──────────────┘               └──────────────┘   │
                  │           │                                         │
                  │           ▼                                         │
                  │     Pass 2: resolve dp rank                         │
                  │     (per req in ready_addrs)                        │
                  │           │                                         │
                  │           ├─ dp_rank preset?    ─────► resolved     │
                  │           ├─ dp_size == 1?      ─────► resolved     │
                  │           ├─ follow_bootstrap?  ─────► resolved     │
                  │           │                                         │
                  │           ▼ (none of above)                         │
                  │     batch query dp_ranks (HTTP)                     │
                  │           │                                         │
                  │           ├─ rank found     ─────► resolved         │
                  │           └─ not found      ─────► remaining ───────┘
                  │                                          (next cycle)
                  │
                  └──────────────┬──────────────┘
                                 │ resolved
                                 ▼
                      create receiver & enqueue
                                 │
                                 ▼
                      handshake → prealloc → transfer → decode

charts genearated with claude

…nsolidate rank mapping into `PrefillServerInfo` (sgl-project#20785) Signed-off-by: Shangming Cai <csmthu@gmail.com> Co-authored-by: hnyls2002 <lsyincs@gmail.com>

ShangmingCai added 2 commits March 17, 2026 21:10

[PD] Cleanup ensure parallel info and dp rank resolving logics

acbe7fe

Signed-off-by: Shangming Cai <csmthu@gmail.com>

fix fake

5731a93

Signed-off-by: Shangming Cai <csmthu@gmail.com>

ShangmingCai requested review from ByronHsu and hnyls2002 as code owners March 17, 2026 13:23

ShangmingCai assigned hnyls2002 Mar 17, 2026

gemini-code-assist Bot reviewed Mar 17, 2026

View reviewed changes

ShangmingCai and others added 4 commits March 17, 2026 21:29

remove stale note

1e65dde

Signed-off-by: Shangming Cai <csmthu@gmail.com>

fix log

e0af071

Signed-off-by: Shangming Cai <csmthu@gmail.com>

optimize name

92c184a

Signed-off-by: Shangming Cai <csmthu@gmail.com>

unify rank mapping into PrefillServerInfo, compute once in ensure_par…

4224969

…allel_info

hnyls2002 added high priority run-ci labels Mar 17, 2026

hnyls2002 added 4 commits March 17, 2026 15:00

ensure in the beginning

2f58791

simplify

9613703

simplify

85c57b6

hnyls2002 changed the title ~~[PD] Cleanup ensure parallel info and dp rank resolving logic~~ [Disagg] Non-blocking ensure_parallel_info and consolidate rank mapping into PrefillServerInfo Mar 17, 2026

hnyls2002 changed the title ~~[Disagg] Non-blocking ensure_parallel_info and consolidate rank mapping into PrefillServerInfo~~ [Disagg] Non-blocking try_ensure_parallel_info in pending queue, consolidate rank mapping into PrefillServerInfo Mar 17, 2026

hnyls2002 merged commit 2acb20f into main Mar 18, 2026
181 of 203 checks passed

hnyls2002 deleted the clean_ensure_pdinfo branch March 18, 2026 00:26

ShangmingCai mentioned this pull request Mar 18, 2026

[PD] Add retry interval in ensure_prefill_info #20832

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Disagg] Non-blocking `try_ensure_parallel_info` in pending queue, consolidate rank mapping into `PrefillServerInfo`#20785

[Disagg] Non-blocking `try_ensure_parallel_info` in pending queue, consolidate rank mapping into `PrefillServerInfo`#20785
hnyls2002 merged 10 commits intomainfrom
clean_ensure_pdinfo

ShangmingCai commented Mar 17, 2026 •

edited by hnyls2002

Loading

Uh oh!

ShangmingCai commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot commented Mar 17, 2026

Uh oh!

github-actions Bot commented Mar 17, 2026

Uh oh!

github-actions Bot commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

hnyls2002 commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ShangmingCai commented Mar 17, 2026 • edited by hnyls2002 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Changes

Uh oh!

ShangmingCai commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot commented Mar 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions Bot commented Mar 17, 2026

Uh oh!

github-actions Bot commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

hnyls2002 commented Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ShangmingCai commented Mar 17, 2026 •

edited by hnyls2002

Loading