Skip to content

[sgl-kernel] add rotary embed kernel for trivial head_sizes#6530

Closed
mickqian wants to merge 11 commits intosgl-project:mainfrom
mickqian:rotary_emb_kernel
Closed

[sgl-kernel] add rotary embed kernel for trivial head_sizes#6530
mickqian wants to merge 11 commits intosgl-project:mainfrom
mickqian:rotary_emb_kernel

Conversation

@mickqian
Copy link
Copy Markdown
Collaborator

@mickqian mickqian commented May 22, 2025

Motivation

Previously for Attention with head_size not in [64, 128, 256, 512] (which is common for Multimodal Attention), sgl will adopt rotary_embedding from vllm.

This pr copied and adapted the mentioned kernel with minor improvements.

Modifications

  1. Add kernel for rotary-embedding with common head_sizes
  2. Add according tests

Benchmark

test_rotary_embedding_benchmark[80-80-1000000.0-1000000.0-False-dtype1-cuda-1-4000-16-16]

实现 Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
native 274.0989 (1.00) 561.6210 (1.00) 278.5268 (1.00) 2.5004 (1.00) 278.5940 (1.00) 1.7020 (1.00) 892;200 3,590.3191 (1.00) 20000 (1.00) 1 (1.00)
original (vllm) 117.0728 (0.43) 195.9121 (0.35) 119.9629 (0.43) 1.1579 (0.46) 119.9138 (0.43) 0.8049 (0.47) 1595;562 8,335.9090 (2.32) 20000 (1.00) 1 (1.00)
sgl 29.7832 (0.11) 731.3141 (1.30) 32.2445 (0.12) 5.2406 (2.10) 33.1532 (0.12) 3.0030 (1.76) 25;21 31.0130 (0.01) 20000 (1.00) 1 (1.00)

test_rotary_embedding_benchmark[80-80-1000000.0-1000000.0-False-dtype0-cuda-1-8840-16-16]

实现 Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
native 556.5220 (1.00) 1,655.5311 (1.00) 560.8802 (1.00) 12.6544 (1.00) 560.6054 (1.00) 1.5302 (1.00) 19;177 1,782.9120 (1.00) 20000 (1.00) 1 (1.00)
original (vllm) 237.4239 (0.43) 1,746.1870 (1.05) 240.1866 (0.43) 16.5433 (1.31) 239.7769 (0.43) 1.1339 (0.74) 14;272 4,163.4303 (2.33) 20000 (1.00) 1 (1.00)
sgl 92.4878 (0.17) 874.3163 (0.53) 94.0537 (0.17) 5.6077 (0.44) 93.9691 (0.17) 0.5364 (0.35) 19;445 10.6322 (0.01) 20000 (1.00) 1 (1.00)

test_rotary_embedding_benchmark[80-80-1000000.0-1000000.0-True-dtype2-cuda-8-8840-16-16]

实现 Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
native 3,992.0881 (1.00) 4,380.6531 (1.00) 4,000.0806 (1.00) 4.0606 (1.00) 3,999.8861 (1.00) 2.7446 (1.00) 1060;247 249.9950 (1.00) 20000 (1.00) 1 (1.00)
original (vllm) 1,640.7282 (0.41) 3,510.4910 (0.80) 1,645.7016 (0.41) 16.5359 (4.07) 1,645.3189 (0.41) 1.7250 (0.63) 31;195 607.6436 (2.43) 20000 (1.00) 1 (1.00)
sgl 642.2000 (0.16) 1,061.5052 (0.24) 645.0266 (0.16) 5.0471 (1.24) 644.9120 (0.16) 0.9649 (0.35) 33;254 1.5503 (0.01) 20000 (1.00) 1 (1.00)

Checklist

@mickqian mickqian changed the title add rotary embed kernel for common head_sizes [sgl-kernel] add rotary embed kernel for common head_sizes May 22, 2025
@FlamingoPg
Copy link
Copy Markdown
Collaborator

Others LGTM

@mickqian mickqian changed the title [sgl-kernel] add rotary embed kernel for common head_sizes [sgl-kernel] add rotary embed kernel for trivial head_sizes May 22, 2025
@Swipe4057
Copy link
Copy Markdown
Contributor

I wanted to test your kernel but it seems there are conflicts with the current version main. Are there any plans to update or do I need to roll back?

@JustinTong0323
Copy link
Copy Markdown
Collaborator

I wanted to test your kernel but it seems there are conflicts with the current version main. Are there any plans to update or do I need to roll back?

Updated.

RubiaCx added a commit to RubiaCx/sglang that referenced this pull request Nov 12, 2025
@mickqian mickqian closed this Nov 13, 2025
@yuan-luo
Copy link
Copy Markdown
Collaborator

@mickqian May I know why this PR was closed?

@mickqian
Copy link
Copy Markdown
Collaborator Author

@mickqian May I know why this PR was closed?

Because I don't have enough time to refine this kernel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants