[sgl-kernel] add rotary embed kernel for trivial head_sizes#6530
Closed
mickqian wants to merge 11 commits intosgl-project:mainfrom
Closed
[sgl-kernel] add rotary embed kernel for trivial head_sizes#6530mickqian wants to merge 11 commits intosgl-project:mainfrom
mickqian wants to merge 11 commits intosgl-project:mainfrom
Conversation
Collaborator
|
Others LGTM |
5 tasks
Contributor
|
I wanted to test your kernel but it seems there are conflicts with the current version main. Are there any plans to update or do I need to roll back? |
Collaborator
Updated. |
Closed
This was referenced Nov 10, 2025
RubiaCx
added a commit
to RubiaCx/sglang
that referenced
this pull request
Nov 12, 2025
6 tasks
Collaborator
|
@mickqian May I know why this PR was closed? |
Collaborator
Author
Because I don't have enough time to refine this kernel |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Previously for Attention with
head_size not in [64, 128, 256, 512](which is common for Multimodal Attention), sgl will adoptrotary_embeddingfrom vllm.This pr copied and adapted the mentioned kernel with minor improvements.
Modifications
Benchmark
test_rotary_embedding_benchmark[80-80-1000000.0-1000000.0-False-dtype1-cuda-1-4000-16-16]
test_rotary_embedding_benchmark[80-80-1000000.0-1000000.0-False-dtype0-cuda-1-8840-16-16]
test_rotary_embedding_benchmark[80-80-1000000.0-1000000.0-True-dtype2-cuda-8-8840-16-16]
Checklist