Skip to content

Fix spec info's filter when reqs are finished right after prefill#14742

Merged
hnyls2002 merged 12 commits intomainfrom
lsyin/fix-spec-info-filter
Dec 13, 2025
Merged

Fix spec info's filter when reqs are finished right after prefill#14742
hnyls2002 merged 12 commits intomainfrom
lsyin/fix-spec-info-filter

Conversation

@hnyls2002
Copy link
Copy Markdown
Collaborator

@hnyls2002 hnyls2002 commented Dec 9, 2025

This PR fixes #14368

Before the fix in the scheduler filter_batch.

[2025-12-11 04:05:55] Scheduler hit an exception: Traceback (most recent call last):
  File "/host_home/common_sync/sglang/python/sglang/srt/managers/scheduler.py", line 2706, in run_scheduler_process
    scheduler.event_loop_normal()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/host_home/common_sync/sglang/python/sglang/srt/managers/scheduler.py", line 989, in event_loop_normal
    batch = self.get_next_batch_to_run()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/host_home/common_sync/sglang/python/sglang/srt/managers/scheduler.py", line 1676, in get_next_batch_to_run
    self.last_batch.filter_batch(
  File "/host_home/common_sync/sglang/python/sglang/srt/managers/schedule_batch.py", line 1857, in filter_batch
    self.spec_info.filter_batch(
  File "/host_home/common_sync/sglang/python/sglang/srt/speculative/eagle_info.py", line 746, in filter_batch
    raise ValueError(error_msg)
ValueError: length of new_indices: 6 != length of topk_p: 7, this should not happen

[2025-12-11 04:05:55] SIGQUIT received. signum=None, frame=None. It usually means one child failed.
[1]    1715904 killed     python test_eagle_infer_a.py

Background of the issue

  • spec_info contains some batching information and should be filtered together with the scheduler batch (which is currently NOT).
    • DECODE: In spec v1's decoding stage, we filter the spec info right after the verification, which causes misalignment with the filtering of the scheduler batch.
    • PREFILL: We will never meet finished requests in the forward_draft_extend, so in the prefill/extend stage, the spec_info filter will be with the scheduler batch filter, which is a different behavior from the decoding stage.

We introduced has_been_filtered to indicate whether the filtering in the scheduler batch should also apply to the spec_info, but this flag was incorrectly assigned previously.

When there are requests finished immediately after the prefill/extend stage, and there is no chunked prefill request, the previous implementation would consider this filter a DECODE filter and wouldn't filter the spec_info at all.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @hnyls2002, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue within the schedule_batch manager where the spec_info filtering mechanism was not behaving as expected, particularly when requests finished their prefill stage. The change refactors the logic for a flag that indicates whether filtering has occurred, ensuring that spec_info is always processed with the correct state, thereby preventing potential errors in speculative decoding or related operations.

Highlights

  • Refactored Filtering Logic: Simplified the determination of the has_been_filtered flag passed to spec_info.filter_batch by introducing a new boolean variable is_extend_filter.
  • Corrected Spec Info Filtering: Addressed a bug where spec_info was incorrectly filtered when requests completed immediately after prefill, ensuring the filter_batch method receives the accurate filtering state.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the logic for determining the has_been_filtered flag in filter_batch, which fixes a subtle bug related to speculative decoding when requests are finished right after prefill. The change is correct and simplifies the implementation. I've added a small suggestion to further improve readability by inlining the logic, making it more self-contained.

Comment thread python/sglang/srt/managers/schedule_batch.py
@jimmy-evo
Copy link
Copy Markdown
Contributor

jimmy-evo commented Dec 10, 2025

works fine with me
LGTM

@hnyls2002
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@hnyls2002 hnyls2002 merged commit ed52d01 into main Dec 13, 2025
110 of 125 checks passed
@hnyls2002 hnyls2002 deleted the lsyin/fix-spec-info-filter branch December 13, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] speculative report error when pd mode = normal

3 participants