Skip to content

[Disagg] Non-blocking try_ensure_parallel_info in pending queue, consolidate rank mapping into PrefillServerInfo#20785

Merged
hnyls2002 merged 10 commits intomainfrom
clean_ensure_pdinfo
Mar 18, 2026
Merged

[Disagg] Non-blocking try_ensure_parallel_info in pending queue, consolidate rank mapping into PrefillServerInfo#20785
hnyls2002 merged 10 commits intomainfrom
clean_ensure_pdinfo

Conversation

@ShangmingCai
Copy link
Copy Markdown
Collaborator

@ShangmingCai ShangmingCai commented Mar 17, 2026

Summary

  • add() becomes a pure cache lookup — no network calls, no blocking
  • try_ensure_parallel_info() is only called in _resolve_pending_reqs() during the scheduling cycle, with retry tracking and abort after max attempts
  • TP/CP/PP rank mapping computed once per bootstrap addr in CommonKVManager._resolve_rank_mapping() and stored on PrefillServerInfo, instead of recomputed per-receiver

Background

  • add() previously called ensure_parallel_info() which blocks up to 30s (5 retries × 1s sleep + 5s HTTP timeout)
  • Rank mapping was duplicated: computed in both ensure_parallel_info path and CommonKVReceiver.__init__(), with identical results

Changes

Non-blocking ensure (conn.py)

  • Replace blocking ensure_parallel_info() (retry loop + time.sleep) with try_ensure_parallel_info() — single attempt, no sleep

Pending queue as the single ensure site (decode.py)

  • add() fast path: cache-only _resolve_prefill_dp_rank() → enqueue or append to pending_reqs
  • _resolve_pending_reqs() is the only place that calls try_ensure_parallel_info(), split into two passes:
    1. _ensure_prefill_info(): per-addr non-blocking fetch with retry counter, abort after 30 cycles
    2. dp rank resolution: local resolve → batch query_prefill_dp_ranks fallback

Unified rank mapping (conn.py)

  • Add _resolve_rank_mapping() on CommonKVManager — computes TP/CP/PP mapping once and stores on PrefillServerInfo
  • CommonKVReceiver.__init__() now reads pre-computed mapping instead of recomputing

Signed-off-by: Shangming Cai <csmthu@gmail.com>
Signed-off-by: Shangming Cai <csmthu@gmail.com>
@ShangmingCai
Copy link
Copy Markdown
Collaborator Author

/rerun-stage stage-c-test-8-gpu-h20

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the handling of parallel information and data parallel (DP) rank resolution within the disaggregation system. The changes aim to remove redundant checks and centralize the process of determining DP ranks for pending requests, leading to a cleaner and potentially more efficient request processing flow.

Highlights

  • Redundant ensure_parallel_info removal: The ensure_parallel_info function call has been removed, streamlining the check for prefill parallel information by directly querying the prefill_info_table.
  • DP rank resolution optimization: The logic for resolving data parallel (DP) ranks has been centralized within _resolve_pending_reqs to prevent redundant calls to _resolve_prefill_dp_rank and improve efficiency.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/sglang/srt/disaggregation/common/conn.py
    • Replaced the ensure_parallel_info function call with a direct check against prefill_info_table for existing bootstrap addresses.
    • Updated the failure message to reflect the new check logic.
  • python/sglang/srt/disaggregation/decode.py
    • Modified the add method to immediately enqueue "fake transfer" requests with a DP rank of 0, bypassing further resolution.
    • Removed the _is_fake_transfer check from _resolve_prefill_dp_rank, as fake transfers are now handled earlier.
    • Updated _resolve_pending_reqs to use room_to_rank.get() for safer rank retrieval and ensure None checks when resolving pending requests.
Activity
  • No human activity has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Copy Markdown
Contributor

✅ Triggered stage-c-test-8-gpu-h20 to run independently (skipping dependencies).

@github-actions
Copy link
Copy Markdown
Contributor

🔗 View workflow run

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the logic for resolving data parallelism (DP) ranks and ensuring parallel information in the disaggregated prefill/decode system. The changes centralize the DP rank resolution logic into the _resolve_pending_reqs method to avoid redundant calls, and move the call to ensure_parallel_info to an earlier stage. These changes aim to improve code clarity and maintainability. The implementation appears correct and aligns with the stated goals.

ShangmingCai and others added 4 commits March 17, 2026 21:29
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Signed-off-by: Shangming Cai <csmthu@gmail.com>
add() now does cache-only lookup (fast path). All network I/O
(try_ensure_parallel_info + query_prefill_dp_ranks) happens in
_resolve_pending_reqs() during the scheduling cycle, with retry
tracking and abort after max attempts.
@hnyls2002 hnyls2002 changed the title [PD] Cleanup ensure parallel info and dp rank resolving logic [Disagg] Non-blocking ensure_parallel_info and consolidate rank mapping into PrefillServerInfo Mar 17, 2026
@hnyls2002 hnyls2002 changed the title [Disagg] Non-blocking ensure_parallel_info and consolidate rank mapping into PrefillServerInfo [Disagg] Non-blocking try_ensure_parallel_info in pending queue, consolidate rank mapping into PrefillServerInfo Mar 17, 2026
@hnyls2002
Copy link
Copy Markdown
Collaborator

                              add(req)
    
                  ┌───────────────┴───────────────┐
                  │ fake transfer?                │ cache miss
                  │ prefill info cached           │
                  │ + dp_rank known?              │
                  ▼                               ▼
            fast path                      pending_reqs ◄───────────────┐
            (cache hit)                         │                       │
                  │                             │ every scheduling      │
                  │                             │ cycle                 │
                  │                             ▼                       │
                  │        ┌──── Pass 1: ensure prefill info ─────┐     │
                  │        │       (per bootstrap_addr)           │     │
                  │        │                                      │     │
                  │        ▼                                      ▼     │
                  │   try_ensure_parallel_info         fetch failed     │
                  │   ┌──────────────┐               ┌──────────────┐   │
                  │   │ cached?      │── yes ─────►  │ count++      │   │
                  │   │              │               │              │   │
                  │   │ fetch        │── ok ───────► │ rank mapping │   │
                  │   │              │               │ + ready      │   │
                  │   │              │               │              │   │
                  │   │              │               │ < 30 ?       │   │
                  │   │              │               │   yes ───────┼──► remaining
                  │   │              │               │   no  ───────┼──► ABORT
                  │   └──────────────┘               └──────────────┘   │
                  │           │                                         │
                  │           ▼                                         │
                  │     Pass 2: resolve dp rank                         │
                  │     (per req in ready_addrs)                        │
                  │           │                                         │
                  │           ├─ dp_rank preset?    ─────► resolved     │
                  │           ├─ dp_size == 1?      ─────► resolved     │
                  │           ├─ follow_bootstrap?  ─────► resolved     │
                  │           │                                         │
                  │           ▼ (none of above)                         │
                  │     batch query dp_ranks (HTTP)                     │
                  │           │                                         │
                  │           ├─ rank found     ─────► resolved         │
                  │           └─ not found      ─────► remaining ───────┘
                  │                                          (next cycle)
    
                  └──────────────┬──────────────┘
                                 │ resolved
    
                      create receiver & enqueue
    
    
                      handshake → prealloc → transfer → decode

charts genearated with claude

@hnyls2002 hnyls2002 merged commit 2acb20f into main Mar 18, 2026
181 of 203 checks passed
@hnyls2002 hnyls2002 deleted the clean_ensure_pdinfo branch March 18, 2026 00:26
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
…nsolidate rank mapping into `PrefillServerInfo` (sgl-project#20785)

Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026
…nsolidate rank mapping into `PrefillServerInfo` (sgl-project#20785)

Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026
…nsolidate rank mapping into `PrefillServerInfo` (sgl-project#20785)

Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
…nsolidate rank mapping into `PrefillServerInfo` (sgl-project#20785)

Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
…nsolidate rank mapping into `PrefillServerInfo` (sgl-project#20785)

Signed-off-by: Shangming Cai <csmthu@gmail.com>
Co-authored-by: hnyls2002 <lsyincs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants