Skip to content

[AMD] Add pip install / wheel build support for ROCm sgl-kernel#15627

Merged
HaiShaw merged 4 commits intosgl-project:mainfrom
akao-amd:rocm-pip-install-dev
Jan 8, 2026
Merged

[AMD] Add pip install / wheel build support for ROCm sgl-kernel#15627
HaiShaw merged 4 commits intosgl-project:mainfrom
akao-amd:rocm-pip-install-dev

Conversation

@akao-amd
Copy link
Copy Markdown
Contributor

Motivation

This PR adds ROCm wheel build/release support for sgl-kernel, which is a pre-requisite to build sglang wheel for ROCm. The contents are directly derived from #14684 but avoid triggering unrelated non-ROCm CI jobs (due to existing workflow path filters).

Modifications

Other than the rationales mentioned in #14684, ROCm-only wheel build helper files are placed under 3rdparty/amd/sgl-kernel/. The ROCm wheel workflow copies these helpers into sgl-kernel/ only for the ROCm build job.

Accuracy Tests

N/A.

Benchmarking and Profiling

As in https://github.com/user-attachments/files/24246371/PIP.Install.Test.Results.xlsx, there is no difference between the two installation method: the old setup_rocm.py or the wheel introduced in this PR.

Checklist

@saienduri @HaiShaw

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @akao-amd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes the foundational build and packaging infrastructure for the sgl-kernel library on ROCm platforms. By introducing dedicated CMake, shell, and Python scripts, it enables the creation of distributable Python wheels for AMD GPUs, which is a critical prerequisite for integrating SGLang with ROCm. The changes ensure that the kernel components can be easily installed via pip, streamlining development and deployment for ROCm users, while maintaining performance parity with existing build methods.

Highlights

  • ROCm Wheel Build Support: Introduces the capability to build Python wheels for the sgl-kernel library specifically for ROCm-enabled AMD GPUs.
  • Prerequisite for SGLang ROCm: This change is a necessary step to enable the full SGLang framework to be built and run on ROCm platforms.
  • Dedicated Build Files: New ROCm-specific CMake configuration (CMakeLists_rocm.txt), build script (build_rocm.sh), wheel renaming script (rename_wheels_rocm.sh), and a Python script for defining the ROCm extension (rocm_hipify.py) have been added.
  • Wheel Index Update: The update_kernel_whl_index.py script has been modified to properly generate and manage the wheel index for ROCm builds.
  • Performance Parity: Benchmarking confirms that the new wheel installation method yields identical performance to the previous setup_rocm.py approach.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/release-whl-kernel.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for building ROCm wheels for sgl-kernel, which is a great addition for AMD GPU users. The changes primarily involve adding new build scripts and a dedicated CMake file for ROCm. My review focuses on improving the robustness and maintainability of these new build components. I've identified a potential runtime issue with an incorrect rpath, along with some opportunities to make the build scripts more robust by removing hardcoded paths and fragile environment detection. I have also pointed out some code that appears to be unused, which could be removed to improve clarity. Overall, the changes are well-structured, and with a few adjustments, this will be a solid contribution.

)

target_link_options(common_ops PRIVATE
"SHELL:-Wl,-rpath,'\$ORIGIN/../../torch/lib'"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The rpath for the PyTorch libraries appears to be incorrect. The compiled library will be installed in .../site-packages/sgl_kernel/, while PyTorch's libraries are typically in .../site-packages/torch/lib/. The relative path from the sgl_kernel directory should be ../torch/lib.

The current path, ../../torch/lib, would resolve to a location outside the site-packages directory, which is likely to cause library loading errors at runtime.

  "SHELL:-Wl,-rpath,'\$ORIGIN/../torch/lib'"

Comment on lines +82 to +83
set(PLAT_LIB_DIR "/usr/lib/x86_64-linux-gnu")
link_directories(${PLAT_LIB_DIR})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding the library path /usr/lib/x86_64-linux-gnu and using link_directories is not a portable practice and can lead to picking up incorrect libraries. While it might work in the controlled CI Docker environment, it makes the build script brittle.

Modern CMake discourages the use of link_directories. It is better to use find_library to locate libraries and link them using their full paths in target_link_libraries. Based on the target_link_libraries call for common_ops, this link_directories call might be redundant as system libraries are often found by the linker automatically. Please consider removing it unless it's strictly necessary.

@@ -0,0 +1,131 @@
#!/bin/bash
set -euo pipefail
ROCM_VERSION=$1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ROCM_VERSION variable is assigned from the first script argument but is not used anywhere in the script. This constitutes dead code and could be confusing for future maintenance. Please either utilize this variable or remove it.

fi

# Detect ROCm version and add appropriate suffix
if ls /opt | grep -q "7.0"; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Detecting the ROCm version by checking for a directory name containing 7.0 in /opt is fragile and tightly couples the script to a specific build environment's directory structure. A more robust approach would be to pass the ROCm version as an argument to this script from the parent build_rocm.sh script. The build_rocm.sh script already accepts a ROCM_VERSION argument which could be passed down.

Comment on lines +1 to +38
from pathlib import Path

import torch
from torch.utils.cpp_extension import CUDAExtension

root = Path(__file__).parent.resolve()

include_dirs = [
root / "include",
root / "include" / "impl",
root / "csrc",
]

sources = [
"csrc/allreduce/custom_all_reduce.hip",
"csrc/allreduce/quick_all_reduce.cu",
"csrc/common_extension_rocm.cc",
"csrc/elementwise/activation.cu",
"csrc/grammar/apply_token_bitmask_inplace_cuda.cu",
"csrc/moe/moe_align_kernel.cu",
"csrc/moe/moe_topk_softmax_kernels.cu",
"csrc/moe/moe_topk_sigmoid_kernels.cu",
"csrc/speculative/eagle_utils.cu",
"csrc/kvcacheio/transfer.cu",
"csrc/elementwise/pos_enc.cu",
]

libraries = ["hiprtc", "amdhip64", "c10", "torch", "torch_python"]

ext_modules = [
CUDAExtension(
name="sgl_kernel.common_ops",
sources=sources,
include_dirs=include_dirs,
libraries=libraries,
py_limited_api=False,
),
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This Python script is executed by build_rocm.sh but it only defines some variables and does not appear to perform any actions. The build process is driven by scikit-build-core and CMakeLists_rocm.txt, which defines its own list of source files. This makes rocm_hipify.py seem like unused code.

If this file is indeed not used by the build process, it should be removed to avoid confusion. If it serves a purpose, please add comments to clarify its role.

@akao-amd akao-amd force-pushed the rocm-pip-install-dev branch from ddde3f0 to 57b4b88 Compare December 23, 2025 12:24
@akao-amd
Copy link
Copy Markdown
Contributor Author

Change log:

  1. Add fixes according to recentlt added kernels.
  2. Confirm that the latest wheel build provides identical results.
    PIP Install Test Results.xlsx

@akao-amd akao-amd force-pushed the rocm-pip-install-dev branch 2 times, most recently from 1f49a31 to 137522c Compare January 5, 2026 00:29
@akao-amd akao-amd force-pushed the rocm-pip-install-dev branch from 137522c to 521a589 Compare January 5, 2026 08:08
@akao-amd
Copy link
Copy Markdown
Contributor Author

akao-amd commented Jan 5, 2026

Change Log:

@akao-amd akao-amd force-pushed the rocm-pip-install-dev branch from 521a589 to 5e72766 Compare January 6, 2026 05:22
@akao-amd akao-amd force-pushed the rocm-pip-install-dev branch 2 times, most recently from 145bd85 to 86f0248 Compare January 6, 2026 11:40
@akao-amd
Copy link
Copy Markdown
Contributor Author

akao-amd commented Jan 6, 2026

Change log:

@akao-amd akao-amd force-pushed the rocm-pip-install-dev branch from 86f0248 to 6277917 Compare January 7, 2026 01:02
RohitNagraj and others added 4 commits January 7, 2026 09:03
These ROCm wheel-build helper files would normally live under
`sgl-kernel/`, but changes there can trigger unrelated non-ROCm CI
jobs due to existing path filters.

To keep the change isolated for now, place them under
`3rdparty/amd/sgl-kernel/` instead.

Co-authored-by: Alan Kao <akao@amd.com>
Co-authored-by: Alan Kao <akao@amd.com>
This patch aligns the wheel build helper to setup_rocm.py according
to the two recent changes: (1) deterministic allreduce from sgl-project#15340
and (2) fast topk from sgl-project#15172.
The newly added `sgl_kernel...+rocm700...` wheel file can bypass
`check_wheel_cuda_version`, causing the subsequent regex match to fail.

For simplicity, split the update function into two functions.
@akao-amd akao-amd force-pushed the rocm-pip-install-dev branch from 6277917 to 3f75461 Compare January 7, 2026 01:03
@akao-amd
Copy link
Copy Markdown
Contributor Author

akao-amd commented Jan 7, 2026

As https://github.com/sgl-project/sglang/actions/runs/20767927351 shows release workflow works successfully, @HaiShaw I suggest merging.

@HaiShaw
Copy link
Copy Markdown
Collaborator

HaiShaw commented Jan 7, 2026

@merrymercy @ispobock please have a look.

@HaiShaw HaiShaw merged commit ab7d582 into sgl-project:main Jan 8, 2026
59 checks passed
akao-amd added a commit to akao-amd/sglang that referenced this pull request Jan 15, 2026
As mentioned in sgl-project#15627, two sgl-kernel practices for ROCm co-exist
for now.  Until they are merged, manual alignment is required.

This patch aligns the ROCm wheel with the new timestep difussion
kernel merged in sgl-project#16766.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants