[AMD] Support fast_topk kernels in sgl-kernel by hubertlu-tw · Pull Request #15172 · sgl-project/sglang

hubertlu-tw · 2025-12-15T08:56:51Z

Motivation

This PR adds ROCm support for SGLang’s fast top-k kernels by wiring the existing topk.cu implementation into the ROCm build and registering the operators in the ROCm extension.

Modifications

Register ops on ROCm: adds sgl_kernel::fast_topk, sgl_kernel::fast_topk_transform_fused, sgl_kernel::fast_topk_transform_ragged_fused to csrc/common_extension_rocm.cc.
Build topk on ROCm: includes csrc/elementwise/topk.cu in setup_rocm.py sources so it is hipified/compiled.
ROCm-only compatibility fix: in csrc/elementwise/topk.cu, adds a #ifdef USE_ROCM cast for cudaFuncSetAttribute(...) so the hipified code compiles (CUDA path remains unchanged).

Tests

pytest -q tests/test_topk.py (112 passed)

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

CC: @HaiShaw

HaiShaw · 2025-12-17T23:52:23Z

+#ifdef SGL_TOPK_DYNAMIC_SMEM_BYTES
+constexpr size_t kSmem = static_cast<size_t>(SGL_TOPK_DYNAMIC_SMEM_BYTES);
+#else
+constexpr size_t kSmem = 48 * 1024;  // bytes


48K a tuned number on MI308, MI300, MI35x?

@HaiShaw
48K for MI30X and 128K for MI35X
https://github.com/sgl-project/sglang/pull/15172/files#diff-aeadd0ae863bbaca9f19de4930782b9b62fdde59b2b1dc1e74d9249ad13f02feR87

HaiShaw · 2025-12-18T01:59:37Z

@zhyncs @ispobock @BBuf please have a review.
Changes are most ROCm relevant.

This patch aligns the wheel build helper to setup_rocm.py according to the two recent changes: (1) deterministic allreduce from sgl-project#15340 and (2) fast topk from sgl-project#15172.

hubertlu-tw added 2 commits December 15, 2025 08:50

[ROCm] support fast_topk kernels in sgl-kernel

a6bc55c

Add test_topk.py to sgl-kernel tests in AMD CI

2db2229

hubertlu-tw requested a review from HaiShaw December 15, 2025 08:56

hubertlu-tw requested review from BBuf, FlamingoPg, ispobock, merrymercy, yizhang2077 and zhyncs as code owners December 15, 2025 08:56

hubertlu-tw added the amd label Dec 15, 2025

hubertlu-tw requested review from Fridge003 and Kangyan-Zhou as code owners December 15, 2025 08:56

hubertlu-tw added sgl-kernel run-ci labels Dec 15, 2025

hubertlu-tw and others added 3 commits December 16, 2025 02:19

Fix the runtime error for gfx942 due to smaller LDS size

5d3e1c4

Merge branch 'main' into fast_topk

835ba5c

Merge branch 'main' into fast_topk

2362032

HaiShaw reviewed Dec 17, 2025

View reviewed changes

HaiShaw approved these changes Dec 18, 2025

View reviewed changes

HaiShaw merged commit 51e2eaa into sgl-project:main Dec 20, 2025
83 of 85 checks passed

Prozac614 pushed a commit to Prozac614/sglang that referenced this pull request Dec 23, 2025

[AMD] Support fast_topk kernels in sgl-kernel (sgl-project#15172)

db32263

jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025

[AMD] Support fast_topk kernels in sgl-kernel (sgl-project#15172)

f856ff9

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

[AMD] Support fast_topk kernels in sgl-kernel (sgl-project#15172)

8fccf08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Support fast_topk kernels in sgl-kernel#15172

[AMD] Support fast_topk kernels in sgl-kernel#15172
HaiShaw merged 5 commits intosgl-project:mainfrom
hubertlu-tw:fast_topk

hubertlu-tw commented Dec 15, 2025

Uh oh!

HaiShaw Dec 17, 2025

Uh oh!

hubertlu-tw Dec 18, 2025

Uh oh!

HaiShaw commented Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hubertlu-tw commented Dec 15, 2025

Motivation

Modifications

Tests

Benchmarking and Profiling

Checklist

Uh oh!

HaiShaw Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

hubertlu-tw Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

HaiShaw commented Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants