Skip to content

[Feature] Add pyproject_rocm.toml for end-to-end ROCm pip installation support#14802

Draft
RohitNagraj wants to merge 2 commits intosgl-project:mainfrom
RohitNagraj:rocm-pyproject-v3
Draft

[Feature] Add pyproject_rocm.toml for end-to-end ROCm pip installation support#14802
RohitNagraj wants to merge 2 commits intosgl-project:mainfrom
RohitNagraj:rocm-pyproject-v3

Conversation

@RohitNagraj
Copy link
Copy Markdown

@RohitNagraj RohitNagraj commented Dec 10, 2025

Motivation

To enable end-to-end pip install sglang support for ROCm, this PR adds the necessary ROCm-specific pyproject file, pyproject_rocm.toml.

Changes

  1. This PR adds a new pyproject_rocm.toml which contains all the dependencies required by SGLang for AMD hardware (except AITER, which is required to be installed from source).
  2. This PR updates the documentation adding new steps to install SGLang with all the required dependencies. This PR also moves the recommended method (using docker) above the "Install from Source" section.

Pending Changes

Next Steps/TODO

  • Build and release sglang-rocm wheel by adding a github workflow similar to release_pypi.yml.

Hosting the Package

PyPI does not allow packages that contain non-PyPI dependencies (torch, torchvision, pytorch-triton-rocm, and sgl-kernel in this case). To solve this, there are two options:

  1. Host on a different index (like pypi.amd.com or sgl-whl).
  2. Remove torch, torchvision, pytorch-triton-rocm, and sgl-kernel from dependencies, making users install them explicitly with pip install torch --index-url ... and release wheel for sglang-rocm on PyPI. This would add one extra step to the user installation process.

Naming the Package

Note that this PR recommends naming the package sglang-rocm for all hosting options. The reason for this is how pip resolves dependencies. --extra-index-url takes priority when resolving packages, but silently falls back to the default index PyPI, which can sometimes lead to issues.

Thus, if we name the package sglang, and user tries to install a specific version that's not available on https://pypi.amd.com/simple, but that version is available on PyPI (which is the NVIDIA version), the dependency resolver will install the NVIDIA version with no warnings or errors.

Further, pip also recommends having unique package names when possible Ref.

Maintenance

  • Once this PR is upstream, sgl-kernel's wheel URL must be updated every time there's a new version of the sgl-kernel released if the first option for hosting is chosen.
  • New Torch Version: If we choose to update the Torch version used, the following changes are required:
    1. Update build_rocm.sh: This file determines the torch version used to build sgl-kernel, and is introduced in [Feature] Adding pip install Support for sgl-kernel for ROCm #14684.
    2. Update pyproject_rocm.toml: The torch version specified in pyproject_rocm.toml is the version installed when user installs SGLang using the wheel. The versions for torchvision and pytorch-triton-rocm also need to be updated. To determine these, you can manually install the desired torch version, which would install compatible versions of torchvision and pytorch-triton-rocm, you can make a note of the compatible torchvision and pytorch-triton-rocm from here. Simply replace the versions of torch, 'torchvision, and pytorch-triton-rocm` with the new versions.

Usage Instructions/User Experience

The usage instructions change based on where the package is hosted.

[Not Available Yet] If SGLang wheel is hosted on pypi.amd.com

# Install AITER from Source
git clone https://github.com/ROCm/aiter.git
cd aiter
git checkout v0.1.7.post5
git submodule update --init --recursive
GPU_ARCH_LIST="gfx950" # Or "gfx942" for MI300x/MI325x
GPU_ARCHS=$GPU_ARCH_LIST python setup.py develop # optionally you can set PREBUILD_KERNELS=1 for gfx942 (MI300x/MI325x) to precompile kernels enabling faster server startup

# Install sglang python package
pip install sglang-rocm --extra-index-url https://pypi.amd.com/simple

[Not Available Yet] If SGLang wheel is hosted on PyPI

# Install AITER from Source
git clone https://github.com/ROCm/aiter.git
cd aiter
git checkout v0.1.7.post5
git submodule update --init --recursive
GPU_ARCH_LIST="gfx950" # Or "gfx942" for MI300x/MI325x
GPU_ARCHS=$GPU_ARCH_LIST python setup.py develop # optionally you can set PREBUILD_KERNELS=1 for gfx942 (MI300x/MI325x) to precompile kernels enabling faster server startup

# Install Torch and Its dependencies
pip install torch==2.10.0.dev20251011 torchvision==0.25.0.dev20251012 pytorch-triton-rocm==3.5.0 --index-url https://download.pytorch.org/whl/nightly/rocm7.0

# Install sgl-kernel (Available after PR #14684 is merged)
pip install sgl-kernel --index-url https://docs.sglang.io/whl/rocm700

# Install SGLang
pip install sglang-rocm

Install from Source

SGLang can be installed from source using the following commands once this PR is merged.

# Use the last release branch
git clone -b v0.5.6.post1 https://github.com/sgl-project/sglang.git
cd sglang

# Install AITER from Source
git clone https://github.com/ROCm/aiter.git
cd aiter
git checkout v0.1.7.post5
git submodule update --init --recursive
GPU_ARCH_LIST="gfx950" # Or "gfx942" for MI300x/MI325x
GPU_ARCHS=$GPU_ARCH_LIST python setup.py develop # optionally you can set PREBUILD_KERNELS=1 for gfx942 (MI300x/MI325x) to precompile kernels enabling faster server startup

# Install sglang python package
rm -rf python/pyproject.toml && mv python/pyproject_rocm.toml python/pyproject.toml
pip install -e "python[rocm700]"

Testing

Dependency Testing

  1. Mooncake: Mooncake is available for pip install and this PR assumes that the same pip install works for ROCm. This PR has not been tested with Mooncake.
  2. Specific Triton Version: If you need to pin a specific Triton version like here, then you can remove pytorch-triton-rocm and add instructions for the user to manually build Triton from source.

Environments

Full AMD test suite (from pr-test-amd.yml) is run on the following matrix:

  • ROCm Versions: [7.0]
  • Python Versions: [3.10, 3.11, 3.12]
  • Hardware: [MI300x, MI350x]

Results

  • Most test results match the results from CI docker. However, due to small variations in package versions, around 5 tests have different results from CI docker.

Checklist

  • Format your code according to the Format code with pre-commit.
  • Update documentation according to Write documentations.
  • Replace sgl-kernel URL with upstream URL.
  • Add GitHub workflow to build and upload sglang wheel to index of choice.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added documentation Improvements or additions to documentation amd dependencies Pull requests that update a dependency file labels Dec 10, 2025
Comment thread docs/platforms/amd_gpu.md Outdated
git checkout v0.1.7.post5 # Or v0.1.4 for ROCm 6.x
git submodule update --init --recursive
GPU_ARCH_LIST="gfx950" # Or "gfx942" for MI300x/MI325x
PREBUILD_KERNELS=1 GPU_ARCHS=$GPU_ARCH_LIST python setup.py develop
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aiter still have dispatch problem with mi355x PREBUILD, so the current rocm.dockerfile doesn't prebuild kernels. Shall we use PREBUILD=1 here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Thanks for pointing out. I've updated the doc accordingly.

Comment thread python/pyproject_rocm.toml Outdated
"pytorch-triton-rocm @ https://download.pytorch.org/whl/pytorch_triton_rocm-3.5.0-cp312-cp312-linux_x86_64.whl ; python_version == '3.12'",
"pytorch-triton-rocm @ https://download.pytorch.org/whl/pytorch_triton_rocm-3.5.0-cp313-cp313-linux_x86_64.whl ; python_version == '3.13'",
"pytorch-triton-rocm @ https://download.pytorch.org/whl/pytorch_triton_rocm-3.5.0-cp314-cp314-linux_x86_64.whl ; python_version == '3.14'",
"sgl-kernel @ <URL TODO>",
Copy link
Copy Markdown
Collaborator

@1am9trash 1am9trash Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyproject_rocm.toml still uses <URL TODO>, so the ROCm pip install flow cannot run end-to-end. Wondering how this was tested?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone working on ROCm-related features, I have a question:
Do you have any suggestions or guidance on when I should pay attention to the relationship between this file and the main pyproject.toml, or possibly other related files that I may not be aware of,
pyproject_rocm.toml duplicates many sections from the main python/pyproject.toml (dependencies, optional deps, package-data, wheel excludes, etc.),
how do we plan to avoid drift between the two files? Should contributors manually update both whenever dependencies change, or is there another mechanism intended?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great question. python/pyproject.toml defines all the dependencies for SGLang for NVIDIA. It is recommended for other platforms to maintain their own pyproject or use the common python/pyproject_other.toml (see Ref).

As a result, the duplications are unavoidable. This means either the developer updating one pyproject also updates the others, or the maintainers will have to manually sync them up time to time.

akao-amd added a commit to RohitNagraj/sglang that referenced this pull request Dec 19, 2025
Remove python/pyproject_rocm.toml and adjust docs/platforms/amd_gpu.md.
These files were accidentally included from draft sgl-project#14802 and cause
unnecessary cross-platform CI runs.
This patch is a starting point for a series of refinements to dependency
management for sglang/sgl-kernel. It introduces an ROCm-specific
pyproject file and marks the new workflow as experimental.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

amd dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants