Skip to content

[Disagg] Skip health check enqueue when PD disagg queues have backlog#20191

Merged
hnyls2002 merged 1 commit intosgl-project:mainfrom
whybeyoung:fix_health_check
Mar 9, 2026
Merged

[Disagg] Skip health check enqueue when PD disagg queues have backlog#20191
hnyls2002 merged 1 commit intosgl-project:mainfrom
whybeyoung:fix_health_check

Conversation

@whybeyoung
Copy link
Copy Markdown
Collaborator

@whybeyoung whybeyoung commented Mar 9, 2026

  • Check disagg_prefill_bootstrap_queue / disagg_prefill_inflight_queue backlog on prefill node
  • Check disagg_decode_prealloc_queue / disagg_decode_transfer_queue backlog on decode node
  • Reuse existing return_health_check_ct fast-path to defer health check signal to next batch completion

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request improves the robustness of the system's health check mechanism, particularly in disaggregated modes. It prevents health checks from being processed if they would be blocked by existing requests in specialized internal queues, thereby providing a more accurate reflection of the system's ability to handle new requests under current load conditions.

Highlights

  • Health Check Logic Enhancement: Expanded the conditions under which a health check generation request is ignored to include the state of disaggregation-specific queues, ensuring more accurate health reporting in disaggregated environments.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/sglang/srt/managers/scheduler.py
    • Refactored the existing health check condition into a clearer has_running_requests variable.
    • Added a new condition will_block_in_pd_queue to check for backlogs in disaggregation-specific queues (bootstrap, prealloc, inflight, transfer).
    • Ensured health checks are ignored if either running requests exist or disaggregation queues are blocked, preventing false positives during high load or queue congestion.
Activity
  • No specific activity has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the health check logic for prefill/decode (PD) disaggregation mode. The changes ensure that health checks are not blocked by backlogs in PD-specific queues, which correctly reflects a busy but healthy server state. The logic appears sound. I have one suggestion to make the code more idiomatic.

Note: Security Review did not run due to the size of the PR.

Comment on lines +1427 to +1439
will_block_in_pd_queue = False
if self.disaggregation_mode == DisaggregationMode.PREFILL:
# If bootstrap queue has backlog, health check will also be blocked there
will_block_in_pd_queue = (
len(self.disagg_prefill_bootstrap_queue.queue) > 0
or len(self.disagg_prefill_inflight_queue) > 0
)
elif self.disaggregation_mode == DisaggregationMode.DECODE:
# If prealloc/transfer queue has backlog, health check will also be blocked there
will_block_in_pd_queue = (
len(self.disagg_decode_prealloc_queue.queue) > 0
or len(self.disagg_decode_transfer_queue.queue) > 0
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For checking if a queue has a backlog, it's more idiomatic in Python to check the truthiness of the collection directly (e.g., if my_list:) or use bool(my_list) rather than checking len(my_list) > 0. This can make the code slightly more concise and Pythonic. You can refactor this block to use bool() for clarity.

Suggested change
will_block_in_pd_queue = False
if self.disaggregation_mode == DisaggregationMode.PREFILL:
# If bootstrap queue has backlog, health check will also be blocked there
will_block_in_pd_queue = (
len(self.disagg_prefill_bootstrap_queue.queue) > 0
or len(self.disagg_prefill_inflight_queue) > 0
)
elif self.disaggregation_mode == DisaggregationMode.DECODE:
# If prealloc/transfer queue has backlog, health check will also be blocked there
will_block_in_pd_queue = (
len(self.disagg_decode_prealloc_queue.queue) > 0
or len(self.disagg_decode_transfer_queue.queue) > 0
)
will_block_in_pd_queue = False
if self.disaggregation_mode == DisaggregationMode.PREFILL:
# If bootstrap queue has backlog, health check will also be blocked there
will_block_in_pd_queue = (
bool(self.disagg_prefill_bootstrap_queue.queue) or
bool(self.disagg_prefill_inflight_queue)
)
elif self.disaggregation_mode == DisaggregationMode.DECODE:
# If prealloc/transfer queue has backlog, health check will also be blocked there
will_block_in_pd_queue = (
bool(self.disagg_decode_prealloc_queue.queue) or
bool(self.disagg_decode_transfer_queue.queue)
)

@hnyls2002 hnyls2002 changed the title enhance pd health check [Disagg] Skip health check enqueue when PD disagg queues have backlog Mar 9, 2026
@hnyls2002
Copy link
Copy Markdown
Collaborator

/rerun-stage stage-c-test-8-gpu-h20

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 9, 2026

✅ Triggered stage-c-test-8-gpu-h20 to run independently (skipping dependencies).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 9, 2026

🔗 View workflow run

@hnyls2002 hnyls2002 merged commit 3e8abc7 into sgl-project:main Mar 9, 2026
55 of 64 checks passed
hnyls2002 added a commit that referenced this pull request Mar 17, 2026
Move disagg queue checks (bootstrap/prealloc/transfer) from the
health-check idle path to the true-idle-only path. These queues
may have items without any request running on GPU, so they cannot
piggyback health check results.

Related: #20296, #20191
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants