[AMD] Add pip install / wheel build support for ROCm sgl-kernel#15627
[AMD] Add pip install / wheel build support for ROCm sgl-kernel#15627HaiShaw merged 4 commits intosgl-project:mainfrom
Conversation
Summary of ChangesHello @akao-amd, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes the foundational build and packaging infrastructure for the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for building ROCm wheels for sgl-kernel, which is a great addition for AMD GPU users. The changes primarily involve adding new build scripts and a dedicated CMake file for ROCm. My review focuses on improving the robustness and maintainability of these new build components. I've identified a potential runtime issue with an incorrect rpath, along with some opportunities to make the build scripts more robust by removing hardcoded paths and fragile environment detection. I have also pointed out some code that appears to be unused, which could be removed to improve clarity. Overall, the changes are well-structured, and with a few adjustments, this will be a solid contribution.
| ) | ||
|
|
||
| target_link_options(common_ops PRIVATE | ||
| "SHELL:-Wl,-rpath,'\$ORIGIN/../../torch/lib'" |
There was a problem hiding this comment.
The rpath for the PyTorch libraries appears to be incorrect. The compiled library will be installed in .../site-packages/sgl_kernel/, while PyTorch's libraries are typically in .../site-packages/torch/lib/. The relative path from the sgl_kernel directory should be ../torch/lib.
The current path, ../../torch/lib, would resolve to a location outside the site-packages directory, which is likely to cause library loading errors at runtime.
"SHELL:-Wl,-rpath,'\$ORIGIN/../torch/lib'"
| set(PLAT_LIB_DIR "/usr/lib/x86_64-linux-gnu") | ||
| link_directories(${PLAT_LIB_DIR}) |
There was a problem hiding this comment.
Hardcoding the library path /usr/lib/x86_64-linux-gnu and using link_directories is not a portable practice and can lead to picking up incorrect libraries. While it might work in the controlled CI Docker environment, it makes the build script brittle.
Modern CMake discourages the use of link_directories. It is better to use find_library to locate libraries and link them using their full paths in target_link_libraries. Based on the target_link_libraries call for common_ops, this link_directories call might be redundant as system libraries are often found by the linker automatically. Please consider removing it unless it's strictly necessary.
| @@ -0,0 +1,131 @@ | |||
| #!/bin/bash | |||
| set -euo pipefail | |||
| ROCM_VERSION=$1 | |||
| fi | ||
|
|
||
| # Detect ROCm version and add appropriate suffix | ||
| if ls /opt | grep -q "7.0"; then |
There was a problem hiding this comment.
Detecting the ROCm version by checking for a directory name containing 7.0 in /opt is fragile and tightly couples the script to a specific build environment's directory structure. A more robust approach would be to pass the ROCm version as an argument to this script from the parent build_rocm.sh script. The build_rocm.sh script already accepts a ROCM_VERSION argument which could be passed down.
| from pathlib import Path | ||
|
|
||
| import torch | ||
| from torch.utils.cpp_extension import CUDAExtension | ||
|
|
||
| root = Path(__file__).parent.resolve() | ||
|
|
||
| include_dirs = [ | ||
| root / "include", | ||
| root / "include" / "impl", | ||
| root / "csrc", | ||
| ] | ||
|
|
||
| sources = [ | ||
| "csrc/allreduce/custom_all_reduce.hip", | ||
| "csrc/allreduce/quick_all_reduce.cu", | ||
| "csrc/common_extension_rocm.cc", | ||
| "csrc/elementwise/activation.cu", | ||
| "csrc/grammar/apply_token_bitmask_inplace_cuda.cu", | ||
| "csrc/moe/moe_align_kernel.cu", | ||
| "csrc/moe/moe_topk_softmax_kernels.cu", | ||
| "csrc/moe/moe_topk_sigmoid_kernels.cu", | ||
| "csrc/speculative/eagle_utils.cu", | ||
| "csrc/kvcacheio/transfer.cu", | ||
| "csrc/elementwise/pos_enc.cu", | ||
| ] | ||
|
|
||
| libraries = ["hiprtc", "amdhip64", "c10", "torch", "torch_python"] | ||
|
|
||
| ext_modules = [ | ||
| CUDAExtension( | ||
| name="sgl_kernel.common_ops", | ||
| sources=sources, | ||
| include_dirs=include_dirs, | ||
| libraries=libraries, | ||
| py_limited_api=False, | ||
| ), | ||
| ] |
There was a problem hiding this comment.
This Python script is executed by build_rocm.sh but it only defines some variables and does not appear to perform any actions. The build process is driven by scikit-build-core and CMakeLists_rocm.txt, which defines its own list of source files. This makes rocm_hipify.py seem like unused code.
If this file is indeed not used by the build process, it should be removed to avoid confusion. If it serves a purpose, please add comments to clarify its role.
ddde3f0 to
57b4b88
Compare
|
Change log:
|
1f49a31 to
137522c
Compare
137522c to
521a589
Compare
|
Change Log:
|
521a589 to
5e72766
Compare
145bd85 to
86f0248
Compare
|
Change log:
|
86f0248 to
6277917
Compare
These ROCm wheel-build helper files would normally live under `sgl-kernel/`, but changes there can trigger unrelated non-ROCm CI jobs due to existing path filters. To keep the change isolated for now, place them under `3rdparty/amd/sgl-kernel/` instead. Co-authored-by: Alan Kao <akao@amd.com>
Co-authored-by: Alan Kao <akao@amd.com>
This patch aligns the wheel build helper to setup_rocm.py according to the two recent changes: (1) deterministic allreduce from sgl-project#15340 and (2) fast topk from sgl-project#15172.
The newly added `sgl_kernel...+rocm700...` wheel file can bypass `check_wheel_cuda_version`, causing the subsequent regex match to fail. For simplicity, split the update function into two functions.
6277917 to
3f75461
Compare
|
As https://github.com/sgl-project/sglang/actions/runs/20767927351 shows release workflow works successfully, @HaiShaw I suggest merging. |
|
@merrymercy @ispobock please have a look. |
As mentioned in sgl-project#15627, two sgl-kernel practices for ROCm co-exist for now. Until they are merged, manual alignment is required. This patch aligns the ROCm wheel with the new timestep difussion kernel merged in sgl-project#16766.
Motivation
This PR adds ROCm wheel build/release support for sgl-kernel, which is a pre-requisite to build sglang wheel for ROCm. The contents are directly derived from #14684 but avoid triggering unrelated non-ROCm CI jobs (due to existing workflow path filters).
Modifications
Other than the rationales mentioned in #14684, ROCm-only wheel build helper files are placed under 3rdparty/amd/sgl-kernel/. The ROCm wheel workflow copies these helpers into sgl-kernel/ only for the ROCm build job.
Accuracy Tests
N/A.
Benchmarking and Profiling
As in https://github.com/user-attachments/files/24246371/PIP.Install.Test.Results.xlsx, there is no difference between the two installation method: the old
setup_rocm.pyor the wheel introduced in this PR.Checklist
@saienduri @HaiShaw