[Feature] Add pyproject_rocm.toml for end-to-end ROCm pip installation support#14802
[Feature] Add pyproject_rocm.toml for end-to-end ROCm pip installation support#14802RohitNagraj wants to merge 2 commits intosgl-project:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
| git checkout v0.1.7.post5 # Or v0.1.4 for ROCm 6.x | ||
| git submodule update --init --recursive | ||
| GPU_ARCH_LIST="gfx950" # Or "gfx942" for MI300x/MI325x | ||
| PREBUILD_KERNELS=1 GPU_ARCHS=$GPU_ARCH_LIST python setup.py develop |
There was a problem hiding this comment.
aiter still have dispatch problem with mi355x PREBUILD, so the current rocm.dockerfile doesn't prebuild kernels. Shall we use PREBUILD=1 here?
There was a problem hiding this comment.
That's right. Thanks for pointing out. I've updated the doc accordingly.
| "pytorch-triton-rocm @ https://download.pytorch.org/whl/pytorch_triton_rocm-3.5.0-cp312-cp312-linux_x86_64.whl ; python_version == '3.12'", | ||
| "pytorch-triton-rocm @ https://download.pytorch.org/whl/pytorch_triton_rocm-3.5.0-cp313-cp313-linux_x86_64.whl ; python_version == '3.13'", | ||
| "pytorch-triton-rocm @ https://download.pytorch.org/whl/pytorch_triton_rocm-3.5.0-cp314-cp314-linux_x86_64.whl ; python_version == '3.14'", | ||
| "sgl-kernel @ <URL TODO>", |
There was a problem hiding this comment.
pyproject_rocm.toml still uses <URL TODO>, so the ROCm pip install flow cannot run end-to-end. Wondering how this was tested?
There was a problem hiding this comment.
As someone working on ROCm-related features, I have a question:
Do you have any suggestions or guidance on when I should pay attention to the relationship between this file and the main pyproject.toml, or possibly other related files that I may not be aware of,
pyproject_rocm.toml duplicates many sections from the main python/pyproject.toml (dependencies, optional deps, package-data, wheel excludes, etc.),
how do we plan to avoid drift between the two files? Should contributors manually update both whenever dependencies change, or is there another mechanism intended?
There was a problem hiding this comment.
This is a great question. python/pyproject.toml defines all the dependencies for SGLang for NVIDIA. It is recommended for other platforms to maintain their own pyproject or use the common python/pyproject_other.toml (see Ref).
As a result, the duplications are unavoidable. This means either the developer updating one pyproject also updates the others, or the maintainers will have to manually sync them up time to time.
Remove python/pyproject_rocm.toml and adjust docs/platforms/amd_gpu.md. These files were accidentally included from draft sgl-project#14802 and cause unnecessary cross-platform CI runs.
This patch is a starting point for a series of refinements to dependency management for sglang/sgl-kernel. It introduces an ROCm-specific pyproject file and marks the new workflow as experimental.
21afdd9 to
6db7371
Compare
Motivation
To enable end-to-end
pip install sglangsupport for ROCm, this PR adds the necessary ROCm-specific pyproject file,pyproject_rocm.toml.Changes
pyproject_rocm.tomlwhich contains all the dependencies required by SGLang for AMD hardware (except AITER, which is required to be installed from source).Pending Changes
sgl-kernelpackage URL from sgl-kernel wheel release once [Feature] Adding pip install Support for sgl-kernel for ROCm #14684 is merged: Once [Feature] Adding pip install Support for sgl-kernel for ROCm #14684 is merged, thesgl-kernelpackage for rocm700 will show up in sgl-kernel wheel release. The URLs for these need to be updated inpyproject_rocm.tomlreplacing<URL TODO>.Next Steps/TODO
sglang-rocmwheel by adding a github workflow similar to release_pypi.yml.Hosting the Package
PyPI does not allow packages that contain non-PyPI dependencies (
torch,torchvision,pytorch-triton-rocm, andsgl-kernelin this case). To solve this, there are two options:torch,torchvision,pytorch-triton-rocm, andsgl-kernelfrom dependencies, making users install them explicitly withpip install torch --index-url ...and release wheel forsglang-rocmon PyPI. This would add one extra step to the user installation process.Naming the Package
Note that this PR recommends naming the package
sglang-rocmfor all hosting options. The reason for this is how pip resolves dependencies.--extra-index-urltakes priority when resolving packages, but silently falls back to the default index PyPI, which can sometimes lead to issues.Thus, if we name the package
sglang, and user tries to install a specific version that's not available on https://pypi.amd.com/simple, but that version is available on PyPI (which is the NVIDIA version), the dependency resolver will install the NVIDIA version with no warnings or errors.Further, pip also recommends having unique package names when possible Ref.
Maintenance
sgl-kernel's wheel URL must be updated every time there's a new version of the sgl-kernel released if the first option for hosting is chosen.build_rocm.sh: This file determines the torch version used to buildsgl-kernel, and is introduced in [Feature] Adding pip install Support for sgl-kernel for ROCm #14684.pyproject_rocm.toml: The torch version specified inpyproject_rocm.tomlis the version installed when user installs SGLang using the wheel. The versions fortorchvisionandpytorch-triton-rocmalso need to be updated. To determine these, you can manually install the desiredtorchversion, which would install compatible versions oftorchvisionandpytorch-triton-rocm, you can make a note of the compatibletorchvisionandpytorch-triton-rocmfrom here. Simply replace the versions oftorch, 'torchvision, andpytorch-triton-rocm` with the new versions.Usage Instructions/User Experience
The usage instructions change based on where the package is hosted.
[Not Available Yet] If SGLang wheel is hosted on pypi.amd.com
[Not Available Yet] If SGLang wheel is hosted on PyPI
Install from Source
SGLang can be installed from source using the following commands once this PR is merged.
Testing
Dependency Testing
pytorch-triton-rocmand add instructions for the user to manually build Triton from source.Environments
Full AMD test suite (from
pr-test-amd.yml) is run on the following matrix:Results
Checklist
sgl-kernelURL with upstream URL.