Skip to content

[AMD] Support fast_topk kernels in sgl-kernel#15172

Merged
HaiShaw merged 5 commits intosgl-project:mainfrom
hubertlu-tw:fast_topk
Dec 20, 2025
Merged

[AMD] Support fast_topk kernels in sgl-kernel#15172
HaiShaw merged 5 commits intosgl-project:mainfrom
hubertlu-tw:fast_topk

Conversation

@hubertlu-tw
Copy link
Copy Markdown
Collaborator

Motivation

This PR adds ROCm support for SGLang’s fast top-k kernels by wiring the existing topk.cu implementation into the ROCm build and registering the operators in the ROCm extension.

Modifications

  • Register ops on ROCm: adds sgl_kernel::fast_topk, sgl_kernel::fast_topk_transform_fused, sgl_kernel::fast_topk_transform_ragged_fused to csrc/common_extension_rocm.cc.

  • Build topk on ROCm: includes csrc/elementwise/topk.cu in setup_rocm.py sources so it is hipified/compiled.

  • ROCm-only compatibility fix: in csrc/elementwise/topk.cu, adds a #ifdef USE_ROCM cast for cudaFuncSetAttribute(...) so the hipified code compiles (CUDA path remains unchanged).

Tests

pytest -q tests/test_topk.py (112 passed)

Benchmarking and Profiling

Checklist

CC: @HaiShaw

#ifdef SGL_TOPK_DYNAMIC_SMEM_BYTES
constexpr size_t kSmem = static_cast<size_t>(SGL_TOPK_DYNAMIC_SMEM_BYTES);
#else
constexpr size_t kSmem = 48 * 1024; // bytes
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

48K a tuned number on MI308, MI300, MI35x?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HaiShaw
Copy link
Copy Markdown
Collaborator

HaiShaw commented Dec 18, 2025

@zhyncs @ispobock @BBuf please have a review.
Changes are most ROCm relevant.

@HaiShaw HaiShaw merged commit 51e2eaa into sgl-project:main Dec 20, 2025
83 of 85 checks passed
Prozac614 pushed a commit to Prozac614/sglang that referenced this pull request Dec 23, 2025
akao-amd added a commit to akao-amd/sglang that referenced this pull request Dec 23, 2025
This patch aligns the wheel build helper to setup_rocm.py according
to the two recent changes: (1) deterministic allreduce from sgl-project#15340
and (2) fast topk from sgl-project#15172.
jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025
akao-amd added a commit to akao-amd/sglang that referenced this pull request Dec 29, 2025
This patch aligns the wheel build helper to setup_rocm.py according
to the two recent changes: (1) deterministic allreduce from sgl-project#15340
and (2) fast topk from sgl-project#15172.
akao-amd added a commit to akao-amd/sglang that referenced this pull request Jan 5, 2026
This patch aligns the wheel build helper to setup_rocm.py according
to the two recent changes: (1) deterministic allreduce from sgl-project#15340
and (2) fast topk from sgl-project#15172.
akao-amd added a commit to akao-amd/sglang that referenced this pull request Jan 5, 2026
This patch aligns the wheel build helper to setup_rocm.py according
to the two recent changes: (1) deterministic allreduce from sgl-project#15340
and (2) fast topk from sgl-project#15172.
akao-amd added a commit to akao-amd/sglang that referenced this pull request Jan 6, 2026
This patch aligns the wheel build helper to setup_rocm.py according
to the two recent changes: (1) deterministic allreduce from sgl-project#15340
and (2) fast topk from sgl-project#15172.
akao-amd added a commit to akao-amd/sglang that referenced this pull request Jan 6, 2026
This patch aligns the wheel build helper to setup_rocm.py according
to the two recent changes: (1) deterministic allreduce from sgl-project#15340
and (2) fast topk from sgl-project#15172.
akao-amd added a commit to akao-amd/sglang that referenced this pull request Jan 7, 2026
This patch aligns the wheel build helper to setup_rocm.py according
to the two recent changes: (1) deterministic allreduce from sgl-project#15340
and (2) fast topk from sgl-project#15172.
YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants