[Feature] Adding pip install Support for sgl-kernel for ROCm#1
[Feature] Adding pip install Support for sgl-kernel for ROCm#1RohitNagraj wants to merge 171 commits intohubertlu-tw:mainfrom
Conversation
| #!/bin/bash | ||
| set -ex | ||
|
|
||
| DOCKER_IMAGE=lmsysorg/sglang:v0.5.3rc0-rocm630-mi30x |
There was a problem hiding this comment.
This will require regular updates. Any better way to handle it?
There was a problem hiding this comment.
True. Ideally we should have a sglang:rocm-latest tag on dockerhub. But since that wasn't available, I took inspiration from Dockerfile.rocm to hardcode the latest. Any suggestions to do it better?
There was a problem hiding this comment.
Is this script for building docker images for CI test? If yes, please check how NVIDIA’s CI test sgl_kernel: https://github.com/sgl-project/sglang/blob/main/.github/workflows/pr-test.yml
There was a problem hiding this comment.
Yes. This is for CI build. In NVIDIA's build.sh, they use PyTorch image as the base. However, for AMD, we need dependencies like AITER installed, ROCm/PyTorch would not work out of the box as a base-image. Let me change the base-image to PyTorch and install the dependencies within the build script. That way, we avoid hardcoding the image to a specific version.
There was a problem hiding this comment.
Did you look into how we run SGLang CI in the confluence page or the scripts in SGLang specifically these two scripts:
- https://github.com/sgl-project/sglang/blob/main/.github/workflows/pr-test-amd.yml#L317-L345
- https://github.com/sgl-project/sglang/blob/main/scripts/ci/amd_ci_install_dependency.sh
For just sgl-kernel tests, there is no aiter dependency. However, we need to find a way to handle it for
pip install --upgrade pip
pip install uv
uv pip install "sglang[all_hip]>=0.5.3rc0"
There was a problem hiding this comment.
My bad, the build_rocm.sh is not for CI but for release-whl-kernel.yml github workflow. For CI, the install happens in amd_ci_install_dependency.sh, and just changing a small part in it to use the wheel from pip is enough once the wheel is on pip. However, this build_rocm.sh script is for release-whl-kernel.yml, which needs a docker image to compile and build the wheel.
NVIDIA's equivalent is found in build.sh that is called in the release-whl-kernel.yml.
Turns out we can use rocm/pytorch as the base image for it. So I'll change the base-image to that.
For
pip install --upgrade pip
pip install uv
uv pip install "sglang[all_hip]>=0.5.3rc0"
once we have the sgl-kernel wheel on Pypi, it should be pretty simple.
There was a problem hiding this comment.
The current flow of how I think about this is:
- This PR (sets up scripts to build wheel)
- Setup a PyPi repo for official
sgl-kernel-rocm - Setup Github Workflow similar to release-whl-kernel.yml for ROCm.
- Update the CI scripts amd_ci_install_dependency.sh and
sglang/python/pyproject.tomlto use these wheels from PyPi withpip install.
2553eaf to
3ec449a
Compare
|
@RohitNagraj please mention sgl-kernel pip install in your PR's title. |
68de54b to
6e94500
Compare
|
Tagged torch versions to build kernel. |
5da632b to
622c2bd
Compare
Remove python/pyproject_rocm.toml and adjust docs/platforms/amd_gpu.md. These files were accidentally included from draft sgl-project#14802 and cause unnecessary cross-platform CI runs.
8fc198a to
3607d66
Compare
…#15340) Co-authored-by: Thomas Wang <1am9trash@gmail.com>
…loading on MI325 (sgl-project#13760) Co-authored-by: Sabre Shao <sabre.shao@amd.com> Co-authored-by: Yusheng (Ethan) Su <yushengsu.thu@gmail.com> Co-authored-by: Hubert Lu <Hubert.Lu@amd.com> Co-authored-by: xsun <sunxiao04@gmail.com>
Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
Co-authored-by: bingxche <Bingxu.Chen@amd.com>
Co-authored-by: Sai Enduri <saimanas.enduri@amd.com>
Motivation
To enable
pip install sgl-kernelsupport for ROCm.Test package is uploaded on Test PyPi here: sgl-kernel-rocm630 and sgl-kernel-rocm700.
Modifications
sgl-kernel/CMakeLists_rocm.txtthat would be used by CMake for building wheel for ROCm, similar to NVIDIA'sCMakeLists.txt.sgl-kernel/build_rocm.shsimilar to NVIDIA'ssgl-kernel/build.shthat builds the ROCm wheel inside a docker image (used by Github Workflows).sgl-kernel/rename_wheels_rocm.shsimilar to existing NVIDIA'ssgl-kernel/rename_wheels.shto rename wheels to the standard format.sgl-kernel/rocm_hipify.pythat hipifies the sources using PyTorch's built in hipify module. This is required by CMake for build. Did not use hipify-clang insideCMakeLists_rocm.txtas it requires CUDA to be available.sgl-kernel-rocm<version>hosted on TestPyPi for CI..github/workflows/release-whl-kernel.ymlto build and push ROCm 6.3 and ROCm 7.0 wheels to SGLang's index (https://docs.sglang.ai/whl/)scripts/ci/update_kernel_whl_index.pyto update sgl-kernel wheel index.Testing
Procedure
scripts/ci/amd_ci_start_container.sh(modifying the image name when required)scripts/ci/amd_ci_install_dependency.sh(usingsgl-kernelfrom TestPyPi as the changes show in this PR. )sgl-kernelunit tests from.github/workflows/pr-test-amd.ymlusing the commanddocker exec -w /sglang-checkout/sgl-kernel/tests ci_sglang python3 -m pytest test_moe_align.py test_moe_topk_softmax.py test_apply_token_bitmask_inplace.py test_activation.py test_kvcacheio.py speculative/test_eagle_utils.pytest/srt/test_mla.pyusing the commanddocker exec -w /sglang-checkout/test/srt ci_sglang bash -c "SGLANG_AMD_CI=1 SGLANG_IS_IN_CI=1 SGLANG_USE_AITER=1 python3 run_suite.py --suite per-commit-amd --range-begin 46 --range-end 47". The Range picks onlytest_mla.pyto run.Checklist
sgl-kerneltest_mla.py.