Skip to content

[Feature] Adding pip install Support for sgl-kernel for ROCm#14684

Closed
RohitNagraj wants to merge 19 commits intosgl-project:mainfrom
RohitNagraj:rocm-pip-install-dev
Closed

[Feature] Adding pip install Support for sgl-kernel for ROCm#14684
RohitNagraj wants to merge 19 commits intosgl-project:mainfrom
RohitNagraj:rocm-pip-install-dev

Conversation

@RohitNagraj
Copy link
Copy Markdown

@RohitNagraj RohitNagraj commented Dec 9, 2025

Motivation

This PR aims to build and host sgl-kernel wheel for ROCm, which is a pre-requisite to build sglang wheel for ROCm.

Modifications

  1. Added sgl-kernel/CMakeLists_rocm.txt that would be used by CMake for building wheel for ROCm, similar to NVIDIA's CMakeLists.txt. We use CMake over setup_rocm.py for building the wheel since the wheel built with CMake is Python version agnostic, whereas a wheel built with setup_rocm.py will only work with the Python version used to build it.
  2. Added sgl-kernel/build_rocm.sh similar to NVIDIA's sgl-kernel/build.sh that builds the ROCm wheel inside a docker container (used by Github Workflows).
  3. Added sgl-kernel/rename_wheels_rocm.sh similar to existing NVIDIA's sgl-kernel/rename_wheels.sh to rename wheels to the standard format. This script detects the rocm version used to build the wheel and renames it to the following template: sgl_kernel-<sglang_version>+rocm<rocm_version>-cp310-abi3-manylinux2014_x86_64.whl.
  4. Added sgl-kernel/rocm_hipify.py that hipifies the sources using PyTorch's built in hipify module. This is required by CMake for build, as CMake expects HIP files to be compiled.
  5. Updated .github/workflows/release-whl-kernel.yml to build and push ROCm 7.0 wheels to SGLang's kernel index.
  6. Added ROCm support to scripts/ci/update_kernel_whl_index.py to update sgl-kernel wheel index.

Dependencies

This PR builds and releases sgl-kernel for ROCm 7.0, ensuring the package is available in the Releases section of github.com/sgl-project/whl.

However, for it to show up on the SGLang kernel wheel index completely, this small PR that adds the index file needs to be merged. Though this should not disrupt any functionality, as we only consume the URL for the package from the Releases section of github.com/sgl-project/whl

Usage Instructions

This PR adds support for building and releasing sgl-kernel to the SGLang Kernel Index.

Users are not expected to install sgl-kernel themselves, since it would be installed as a dependency by the SGLang wheel built with #14802. Users are only expected to pip install SGLang using the wheel built with #14802 and that should automatically install sgl-kernel as part of it.

However, if the user chooses to, they can install sgl-kernel on ROCm by choosing the specific ROCm version from SGLang kernel index and running the install command:

pip install sgl-kernel --index-url https://docs.sglang.io/whl/rocm700

Note: The sgl-kenel package is built with Torch 2.10.0 Nightly for ROCm 7.0 and the user must install the same torch version if they choose to install sgl-kernel using the pre-built wheel. Refer to the specific torch version in #14802 .

Maintenance

  • New kernel: When a new kernel is added, similar to how we currently update sources in sgl-kernel/setup_rocm.py, the sources will need to be updated in sgl-kernel/build_rocm.py and sgl-kernel/CMakeLists_rocm.txt.
  • New Torch Version: If we choose to update the Torch version used, the following changes are required:
    1. Update build_rocm.sh: This file determines the torch version used to build sgl-kernel.
    2. **Update pyproject_rocm.toml from [Feature] Add pyproject_rocm.toml for end-to-end ROCm pip installation support #14802:**The torch version specified in pyproject_rocm.toml is the version installed when user installs SGLang using the wheel. The versions for torchvision and pytorch-triton-rocm also need to be updated. To determine these, you can manually install the desired torch version, which would install compatible versions of torchvision and pytorch-triton-rocm, you can make a note of the compatible torchvision and pytorch-triton-rocm from here. Simply replace the versions of torch, 'torchvision, and pytorch-triton-rocm` with the new versions.
  • New ROCm version: If we want to update the Torch version and the new Torch version is built for a new ROCm version, a new wheel must be built by modifying .github/workflows/release-whl-kernel.yml, sgl-kernel/rename_wheels_rocm.sh, and sgl-kernel/build_rocm.sh.
  • New Architecture: To add support for a new architecture in the future, the AMDGPU_TARGET variable needs to be updated in sgl-kernel/build_rocm.sh. And any compiler flags specific to the architecture can be set in sgl-kernel/CMakeLists_rocm.txt.

Testing

Process

For testing the sgl-kernel, we build sgl-kernel wheel using desired ROCm and Torch version. Then, install sglang along with the same ROCm and Torch versions. Now, install sgl-kernel using the wheel built, and then run the following suite of tests (which is directly taken from the pr-test-amd.yml github workflow):

python3 -m pytest test_moe_align.py test_moe_topk_softmax.py speculative/test_eagle_utils.py test_apply_token_bitmask_inplace.py test_activation.py test_kvcacheio.py

Environments

The above tests were run on the following matrix of environments:
ROCm Versions: [7.0]
Python Versions: [3.10, 3.11, 3.12]
Hardware: [MI300x, MI350x]

Results

✅ All the unit tests run pass successfully.

Checklist

  • Format your code according to the Format code with pre-commit.
  • Add code support to build ROCm wheels for sgl-kernel
  • Test pip install functionality for ROCm 6.3 and ROCm 7.0 on MI300x and MI350x.
  • Add wheel release as part of the Github Workflow.
  • Test functionality on Python 3.10, 3.11, 3.12 with install from TestPyPI.
  • Update documentation according to Write documentations.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added documentation Improvements or additions to documentation amd sgl-kernel labels Dec 9, 2025
@HaiShaw
Copy link
Copy Markdown
Collaborator

HaiShaw commented Dec 9, 2025

@RohitNagraj any readme on howto instructions?

@HaiShaw
Copy link
Copy Markdown
Collaborator

HaiShaw commented Dec 9, 2025

/tag-and-rerun-ci 12/10

@github-actions github-actions Bot added the run-ci label Dec 9, 2025
Comment thread sgl-kernel/build_rocm.sh Outdated
@RohitNagraj
Copy link
Copy Markdown
Author

@RohitNagraj any readme on howto instructions?

This PR builds only the sgl-kernel subpackage. Users are not expected to install sgl-kernel as it is a requirement of sglang itself.

On the other hand, for building sgl-kernel, the current method we have, using setup.py is easier for users.

@yichiche
Copy link
Copy Markdown
Collaborator

yichiche commented Dec 9, 2025

@RohitNagraj any readme on howto instructions?

This PR builds only the sgl-kernel subpackage. Users are not expected to install sgl-kernel as it is a requirement of sglang itself.

On the other hand, for building sgl-kernel, the current method we have, using setup.py is easier for users.

@RohitNagraj In what cases would we build the sgl-kernel subpackage on its own instead of using setup.py? If we modify any CUDA code or compiled scripts inside sgl-kernel, would running this Makefile be faster than performing a usual setup.py install?

And after upgrading the image in the future, under what circumstances would we need to update this Makefile?

@RohitNagraj
Copy link
Copy Markdown
Author

@RohitNagraj any readme on howto instructions?

This PR builds only the sgl-kernel subpackage. Users are not expected to install sgl-kernel as it is a requirement of sglang itself.
On the other hand, for building sgl-kernel, the current method we have, using setup.py is easier for users.

@RohitNagraj In what cases would we build the sgl-kernel subpackage on its own instead of using setup.py? If we modify any CUDA code or compiled scripts inside sgl-kernel, would running this Makefile be faster than performing a usual setup.py install?

This is a great question. Building sgl-kernel subpackage and hosting it here would enable us to install sgl-kernel as a dependency inside pyproject.

setup.py can also be used to build the wheel and host the same. However, this would require building one wheel for every python version (which is a valid approach used by many packages). On the other hand, using a CMake based build creates a python version agnostic wheel (which is also what NVIDIA uses for SGLang).

And after upgrading the image in the future, under what circumstances would we need to update this Makefile?

  1. The sources in the CMakeLists_rocm.txt and in rocm_hipify.py will have to be updated the same way we currently do it in setup_rocm.py.
  2. CMakeLists_rocm.txt will need to be updated if we add a new architecture support or add new compilation flags.

@akao-amd
Copy link
Copy Markdown
Contributor

Hi @RohitNagraj

  1. (Together with [Feature] Add pyproject_rocm.toml for end-to-end ROCm pip installation support #14802) what will be the differences between I manually do pip uninstall sglang sgl-kernel && cd sgl-kernel && python setup_rocm.py install && cd .. && pip install -e "python[all_hip]" --no-deps myself? It is most of internal developer's usage, and I wonder the impact and benefit of these changes.
  2. Do you expect to host the wheel packages on https://download.pytorch.org/whl/nightly/rocm7.0?

@hubertlu-tw
Copy link
Copy Markdown
Collaborator

You probably also need to modify the following scripts used in https://github.com/sgl-project/sglang/blob/main/.github/workflows/pr-test-amd.yml

  • scripts/ci/amd_ci_start_container.sh
  • scripts/ci/amd_ci_install_dependency.sh
    Otherwise, our upstream CI will not be able to test your changes in the PR.

CC: @saienduri

@RohitNagraj
Copy link
Copy Markdown
Author

Hi @RohitNagraj

  1. (Together with [Feature] Add pyproject_rocm.toml for end-to-end ROCm pip installation support #14802) what will be the differences between I manually do pip uninstall sglang sgl-kernel && cd sgl-kernel && python setup_rocm.py install && cd .. && pip install -e "python[all_hip]" --no-deps myself? It is most of internal developer's usage, and I wonder the impact and benefit of these changes.
  2. Do you expect to host the wheel packages on https://download.pytorch.org/whl/nightly/rocm7.0?
  1. With the setup you mentioned, it builds the sgl-kernel and then installs. And for sglang, it does not install Torch. On the other hand, using this PR and [Feature] Add pyproject_rocm.toml for end-to-end ROCm pip installation support #14802 , we'd have a wheel built that contains all the dependencies required.

  2. https://download.pytorch.org/whl/nightly/rocm7.0 is Pytorch's index. We don't have permission to host on Pytorch's index. sgl-kernel will be hosted on SGLang Kernel Index. We are yet to decide on where to host SGLang itself.

@akao-amd akao-amd force-pushed the rocm-pip-install-dev branch 2 times, most recently from 8106ed7 to c50f1c2 Compare December 18, 2025 22:19
@github-actions github-actions Bot added the dependencies Pull requests that update a dependency file label Dec 18, 2025
@akao-amd
Copy link
Copy Markdown
Contributor

akao-amd commented Dec 18, 2025

I attached aggregated test summaries from MI300 and MI355 runs. PIP Install Test Results.xlsx

Result: this change does not introduce new errors/regressions. Existing sgl-kernel tests appear unrelated to this PR, but CI errors should be fixed first.

Follow-ups (separate work items):

@akao-amd akao-amd force-pushed the rocm-pip-install-dev branch from 8fc198a to 3607d66 Compare December 19, 2025 07:14
@akao-amd
Copy link
Copy Markdown
Contributor

@RohitNagraj Would you help to drop this PR? #15627 is meant to replace this one.

@akao-amd
Copy link
Copy Markdown
Contributor

akao-amd commented Jan 9, 2026

As #15627 has been merged, I suggest closing this PR.

@HaiShaw HaiShaw closed this Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

amd dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation run-ci sgl-kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants