Skip to content

[Perf] Restore torch.compile fusion for topk postprocessing#21771

Merged
Fridge003 merged 2 commits intosgl-project:mainfrom
nvcastet:fix/restore-torch-compile-topk-postprocess
Apr 7, 2026
Merged

[Perf] Restore torch.compile fusion for topk postprocessing#21771
Fridge003 merged 2 commits intosgl-project:mainfrom
nvcastet:fix/restore-torch-compile-topk-postprocess

Conversation

@nvcastet
Copy link
Copy Markdown
Collaborator

@nvcastet nvcastet commented Mar 31, 2026

Motivation

PR #16945 reorganized topk logic into _post_process_topk_ids but inlined topk_ids_logical_to_physical and _mask_topk_ids_padded_region instead of calling the existing @torch.compile-decorated _biased_grouped_topk_postprocess. This was flagged during review by @fzyzcjy (comment):

qq: does this mean this will launch a kernel while this should be fused in many cases

The regression causes these two operations to run as separate eager kernels on CUDA instead of being fused via torch.compile, impacting expert-parallel / EPLB paths.

Current ToT:
image

This PR restores fusion present before PR #16945:
image

Modifications

Replace the two inlined calls in _post_process_topk_ids with a call to the existing compiled _biased_grouped_topk_postprocess, restoring kernel fusion. The function was already defined with @torch.compile(dynamic=True, backend=get_compiler_backend()) but had become dead code after #16945.

Checklist

  • Format: pre-commit run --all-files

PR sgl-project#16945 refactored topk postprocessing into `_post_process_topk_ids`
but inlined the `topk_ids_logical_to_physical` and
`_mask_topk_ids_padded_region` calls instead of delegating to the
existing `@torch.compile`-decorated `_biased_grouped_topk_postprocess`.

This caused those two operations to run as separate eager kernels
instead of being fused by torch.compile, a regression for CUDA paths
using expert-parallel / EPLB.

Fix: call `_biased_grouped_topk_postprocess` (which already carries
`@torch.compile(dynamic=True)`) from within `_post_process_topk_ids`,
restoring the compiled kernel fusion.

Ref: sgl-project#16945 (comment)
@nvcastet
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the post-processing logic for top-k IDs in the MoE layer by replacing sequential calls to topk_ids_logical_to_physical and _mask_topk_ids_padded_region with a consolidated call to _biased_grouped_topk_postprocess when running on CUDA. I have no feedback to provide.

Copy link
Copy Markdown
Collaborator

@trevor-m trevor-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Fridge003 Fridge003 merged commit 490fa9f into sgl-project:main Apr 7, 2026
206 of 249 checks passed
carlosfundora pushed a commit to carlosfundora/sglang-1-bit-turbo that referenced this pull request Apr 8, 2026
…stprocessing (sgl-project#21771)

Upstream SHA: 490fa9f
Cherry-picked from sgl-project/sglang

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants