Skip to content

[CPU] Add apply_routed_scaling_factor_on_output support for biased_grouped_topk fusion#22413

Merged
mingfeima merged 7 commits intosgl-project:mainfrom
jianan-gu:jianan/bias_gtopk
Apr 10, 2026
Merged

[CPU] Add apply_routed_scaling_factor_on_output support for biased_grouped_topk fusion#22413
mingfeima merged 7 commits intosgl-project:mainfrom
jianan-gu:jianan/bias_gtopk

Conversation

@jianan-gu
Copy link
Copy Markdown
Contributor

This PR:

  1. removes the limit of apply_routed_scaling_factor_on_output for CPU path of the fusion biased_grouped_topk_cpu
  2. add fp32 dtype support for gating_output
  3. refine topk expert numebers

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@jianan-gu jianan-gu changed the title [CPU] Add apply_routed_scaling_factor support for biased_grouped_topk [CPU] Add apply_routed_scaling_factor_on_output support for biased_grouped_topk fusion Apr 9, 2026
Copy link
Copy Markdown
Collaborator

@mingfeima mingfeima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor changes needed to simplify the code.

Comment thread sgl-kernel/csrc/cpu/common.h Outdated
@jianan-gu jianan-gu requested a review from mingfeima April 9, 2026 05:32
@mingfeima
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@github-actions github-actions Bot added the run-ci label Apr 9, 2026
@Kangyan-Zhou
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@jianan-gu
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@jianan-gu
Copy link
Copy Markdown
Contributor Author

Latest Xeon CI broken by this PR #20796

@mingfeima mingfeima merged commit 2ab1415 into sgl-project:main Apr 10, 2026
53 of 63 checks passed
Fridge003 pushed a commit that referenced this pull request Apr 11, 2026
…ouped_topk fusion (#22413)

Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
pyc96 pushed a commit to pyc96/sglang that referenced this pull request Apr 14, 2026
…ouped_topk fusion (sgl-project#22413)

Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
…ouped_topk fusion (sgl-project#22413)

Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
blzheng added a commit to blzheng/sglang that referenced this pull request Apr 28, 2026
* fix topk softmax performance issue (sgl-project#14702)

* [CPU] Add apply_routed_scaling_factor_on_output support for biased_grouped_topk fusion (sgl-project#22413)

Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>

* add kernel biased_topk_cpu

* add kernel hash_topk_cpu

---------

Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: jianan-gu <jianan.gu@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants