bugfix[schedule]: Refactor sort method and add related UT#13576
bugfix[schedule]: Refactor sort method and add related UT#13576hnyls2002 merged 28 commits intosgl-project:mainfrom
Conversation
Summary of ChangesHello @SeanWeiSean, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on improving the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request introduces several refactorings to the scheduler policy, including reusing RadixKey, simplifying priority sorting logic with a priority_sign attribute, and extracting duplicated code into a helper method. It also includes a critical bug fix in preempt_to_schedule where the wrong request was being used to calculate token offsets.
My review identifies two critical bugs in the refactored sorting methods where self was incorrectly added to @staticmethod signatures, which would cause TypeErrors at runtime. I've also pointed out some minor style issues for improved readability. The rest of the changes, including the bug fix and other refactorings, look good.
| def _sort_by_longest_output( | ||
| self, |
There was a problem hiding this comment.
This method is decorated with @staticmethod, but self has been added as the first parameter. A static method does not receive the instance as its first argument. This will cause a TypeError at runtime because the call at line 134 provides 3 arguments, but the method now expects 4. Please remove the self parameter.
| def _sort_by_longest_output( | |
| self, | |
| def _sort_by_longest_output(waiting_queue: List[Req], |
| def _sort_by_priority_and_fcfs( | ||
| waiting_queue: List[Req], schedule_low_priority_values_first: bool | ||
| self, waiting_queue: List[Req], priority_sign: int |
There was a problem hiding this comment.
Similar to _sort_by_longest_output, this @staticmethod has self as its first parameter, which is incorrect. The call at line 110 passes 2 arguments, but the method now expects 3, which will lead to a TypeError. Please remove the self parameter.
def _sort_by_priority_and_fcfs(waiting_queue: List[Req], priority_sign: int| ) | ||
| sorted_running_reqs = sorted( | ||
| self.running_batch.reqs, | ||
| key=lambda x: (x.priority * (- self.priority_sign), -x.time_stats.wait_queue_entry_time), |
| priority_diff = req.priority - running_req.priority | ||
| if server_args.schedule_low_priority_values_first: | ||
| priority_diff *= -1 | ||
| priority_diff = (req.priority - running_req.priority) * (- self.priority_sign) |
| for i, running_req in enumerate(self.running_batch.reqs): | ||
| if running_req in preemptible_reqs: | ||
| self.rem_total_token_offset -= ( | ||
| self._get_running_request_total_token_offset(req) |
|
test_over_preempt_success_low_priority_values_first will fail due to it aginst a bug fixed by #12494 |
Merge branch 'main' into yuxwei/improveschedularpolicy
|
LGTM but would like to have rem_total_tokens() refactoring under hnyls2002@'s radar! Would you be able to resolve the merge conflict to run the CI beforehand? |
|
Thanks, to be more clear, i remove the token calculation refactor part and only keep the UT and a simple refactor to make this pr more clean and trackable. And now merged latest main. |
…anWeiSean/sglang into yuxwei/improveschedularpolicy
|
LGTM! Thank you! |
|
/tag-and-rerun-ci |
|
/rerun-failed-ci |
|
/rerun-failed-ci |
…n_eagle3_dp * 'main' of https://github.com/sgl-project/sglang: (208 commits) MoE: Skip SiLU/GELU activation for masked experts (sgl-project#15539) [GLM-ASR] GLM-ASR Support (sgl-project#15570) Improve engine customization interface (sgl-project#15635) chore: bump sgl-kernel version to 0.3.20 (sgl-project#15590) bugfix[schedule]: Refactor sort method and add related UT (sgl-project#13576) Adjust wrong `mtp` meaning introduce by mimo (sgl-project#15632) Tiny add back missing router per attempt response metric (sgl-project#15621) Fix router gRPC mode launch error caused by async loading (sgl-project#15368) [model-gateway] return 503 when all workers are circuit-broken (sgl-project#15611) [Diffusion] Support peak memory record in offline generate and serving (sgl-project#15610) [VLM] Tiny: Unify VLM environment variables (sgl-project#15572) [diffusion] chore: remove default post-denoising dit offload in local mode (sgl-project#15573) Tiny enable soft watchdog in CI for stuck without logs (sgl-project#15616) Tiny add stuck simulation (sgl-project#15613) Support soft watchdog for tokenizer/detokenizer/dp-controller processes (sgl-project#15607) Tiny avoid EnvField misuse (sgl-project#15612) add decode round robin policy (sgl-project#15164) Add glm-4.6-fp8 with/without mtp in nightly ci (sgl-project#15566) Adapt fixture-kit to gsm8k mixin (sgl-project#15599) [model-gateway] add retry support to OpenAI router chat endpoint (sgl-project#15589) ...
…t#13576) Co-authored-by: Yuxuan Wei <w1300012920@pku.edu.cn> Co-authored-by: Yuxuan Wei <w1300012920@gmail.com> Co-authored-by: Yuxuan Wei🚚 <yuxwei@microsoft.com>
…t#13576) Co-authored-by: Yuxuan Wei <w1300012920@pku.edu.cn> Co-authored-by: Yuxuan Wei <w1300012920@gmail.com> Co-authored-by: Yuxuan Wei🚚 <yuxwei@microsoft.com>

Motivation
Reuse Calc Code and try to fix a token counting issue, build test cases against PrefillAdder Class
Modifications
1. Fix scheduling bugsCorrects errors in the preemption logic where the wrong request object was used to compute token offsets. This ensures that high-priority requests can preempt low-priority ones accurately, improving scheduler correctness and preventing resource misallocation.Refactor scheduler logic for clarity and maintainability
The PR reorganizes scheduling functions, including priority sorting and prefix matching logic. A cleaner structure makes it easier to extend the scheduler in the future for more complex strategies (e.g., multi-level queues, mixed-priority handling).
Strengthen test coverage
Adds unit tests for edge cases in prefill, preemption, and priority handling, improving code reliability and reducing the risk of regressions in future updates.
Detail the changes made in this pull request.
2. Reuse RadixKey3. Reuse _calc_available_and_evictable_tokensAccuracy Tests
Benchmarking and Profiling
Checklist