[ROCm] Make bare-host ROCm install self-sufficient#3070
Conversation
Moves GPU-vendor-specific runtime deps out of common.txt into requirements/cuda_core.txt and requirements/rocm_core.txt. setup.py reads common.txt plus whichever core file matches BUILD_WITH_HIP so `pip install -e .` Just Works on both CUDA and ROCm hosts. - Drop cupy-cuda12x and nixl from common.txt (both are CUDA-only on PyPI; the nixl meta-package unconditionally pulls nixl-cu12, which installs nixl_ep/ and breaks ROCm runtime). - cuda.txt now -r cuda_core.txt so Dockerfile's `pip install -r cuda.txt` still pulls the same set. - Remove the [tool.setuptools.dynamic] dependencies block from pyproject.toml; install_requires is driven by setup.py now. - Add a second "Without vLLM docker base image" subsection to the ROCm install docs, mirroring the CUDA from-source flow line-for-line (uv venv -> -r build.txt -> torch from ROCm wheel index -> build). The existing rocm/vllm-dev flow stays as-is. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Shaoting Feng <shaotingf@uchicago.edu>
There was a problem hiding this comment.
Code Review
This pull request refactors the dependency management system to support both CUDA and ROCm environments by dynamically setting install_requires in setup.py. It introduces vendor-specific requirement files and updates the ROCm installation documentation. Feedback includes correcting non-existent package versions and URLs for ROCm 7.0, and improving the robustness of the requirements parser to handle inline comments.
| # cannot be told about via install_requires; users install them manually | ||
| # per the "LMCache on ROCm / Without vLLM docker base image" docs. | ||
|
|
||
| cupy-rocm-7-0 |
There was a problem hiding this comment.
The package cupy-rocm-7-0 does not appear to exist on PyPI. As of the current ROCm releases, the versions are typically in the 6.x range (e.g., cupy-rocm-6-0 or cupy-rocm-6-2). Using a non-existent version will cause the installation to fail. Please verify the correct ROCm version and update this requirement.
| uv pip install -r requirements/build.txt | ||
|
|
||
| # Install torch from the ROCm wheel index | ||
| uv pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm7.0 |
There was a problem hiding this comment.
The ROCm wheel index URL https://download.pytorch.org/whl/rocm7.0 seems incorrect. ROCm 7.0 has not been released yet; the current latest versions are in the 6.x range (e.g., rocm6.2). This URL will likely return a 404 error, breaking the installation instructions for users.
| def _read_requirements(path: Path) -> list[str]: | ||
| reqs: list[str] = [] | ||
| for raw in path.read_text().splitlines(): | ||
| line = raw.strip() | ||
| if line and not line.startswith("#"): | ||
| reqs.append(line) | ||
| return reqs |
There was a problem hiding this comment.
The _read_requirements function is currently fragile as it does not handle inline comments (e.g., package >= 1.0 # comment). If such a comment is added to the requirements files, the entire string including the comment will be passed to install_requires, which will cause setuptools or pip to fail. It is safer to strip inline comments before processing.
| def _read_requirements(path: Path) -> list[str]: | |
| reqs: list[str] = [] | |
| for raw in path.read_text().splitlines(): | |
| line = raw.strip() | |
| if line and not line.startswith("#"): | |
| reqs.append(line) | |
| return reqs | |
| def _read_requirements(path: Path) -> list[str]: | |
| reqs: list[str] = [] | |
| for raw in path.read_text().splitlines(): | |
| line = raw.split("#")[0].strip() | |
| if line: | |
| reqs.append(line) | |
| return reqs |
Moves GPU-vendor-specific runtime deps out of common.txt into requirements/cuda_core.txt and requirements/rocm_core.txt. setup.py reads common.txt plus whichever core file matches BUILD_WITH_HIP so
pip install -e .Just Works on both CUDA and ROCm hosts.pip install -r cuda.txtstill pulls the same set.What this PR does / why we need it:
Special notes for your reviewers:
If applicable:
Note
Medium Risk
Changes Python packaging metadata and install dependency resolution, which can break installs across CUDA/ROCm or sdist/wheel paths if the new requirement selection logic is wrong.
Overview
Makes LMCache installs choose the correct GPU-vendor runtime dependencies at build/install time instead of hardcoding CUDA-only packages for everyone.
Vendor-specific deps are split into new
requirements/cuda_core.txtandrequirements/rocm_core.txt, andsetup.pynow buildsinstall_requiresby readingrequirements/common.txtplus the appropriate core file based onBUILD_WITH_HIP.cuda.txtis updated to includecuda_core.txt, and ROCm installation docs add a new bare host flow (install ROCm torch from the wheel index, then build withBUILD_WITH_HIP=1).Reviewed by Cursor Bugbot for commit f4448ca. Bugbot is set up for automated code reviews on this repo. Configure here.