[CI] Run the same test set on AMD as on NVIDIA#3071
Merged
sammshen merged 3 commits intoLMCache:devfrom Apr 20, 2026
Merged
Conversation
Moves GPU-vendor-specific runtime deps out of common.txt into requirements/cuda_core.txt and requirements/rocm_core.txt. setup.py reads common.txt plus whichever core file matches BUILD_WITH_HIP so `pip install -e .` Just Works on both CUDA and ROCm hosts. - Drop cupy-cuda12x and nixl from common.txt (both are CUDA-only on PyPI; the nixl meta-package unconditionally pulls nixl-cu12, which installs nixl_ep/ and breaks ROCm runtime). - cuda.txt now -r cuda_core.txt so Dockerfile's `pip install -r cuda.txt` still pulls the same set. - Remove the [tool.setuptools.dynamic] dependencies block from pyproject.toml; install_requires is driven by setup.py now. - Add a second "Without vLLM docker base image" subsection to the ROCm install docs, mirroring the CUDA from-source flow line-for-line (uv venv -> -r build.txt -> torch from ROCm wheel index -> build). The existing rocm/vllm-dev flow stays as-is. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Shaoting Feng <shaotingf@uchicago.edu>
The AMD pytest invocation was ignoring tests/v1/multiprocess and tests/v1/mp_observability/test_event_recorder.py on top of the common --ignore set. Those were skipped because cupy-cuda12x ended up installed on ROCm hosts (via common.txt and via nixl->nixl-cu12), which broke the cupy.cuda.ExternalStream / event-recorder paths at import time. With #<install-pr> merged, BUILD_WITH_HIP=1 pulls cupy-rocm-7-0 and omits nixl, so those suites can run on AMD. Collapse the two pytest branches in pipeline.yml into one identical invocation. Depends on: #<install-pr> (without it, the unignored suites will fail at import on AMD because cupy-cuda12x is not ROCm-compatible.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Shaoting Feng <shaotingf@uchicago.edu>
Contributor
There was a problem hiding this comment.
Code Review
This pull request refactors the dependency management system to dynamically handle CUDA and ROCm requirements in setup.py using vendor-specific files. It also updates the ROCm installation documentation for bare hosts and simplifies the Buildkite CI pipeline by consolidating test commands. I have no feedback to provide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
Collapses the two pytest branches in
.buildkite/pipeline.ymlso AMDruns the same test set as NVIDIA. Drops the two AMD-only
--ignoreentries:
tests/v1/multiprocesstests/v1/mp_observability/test_event_recorder.pyThose suites were skipped because
cupy-cuda12xended up installed onROCm hosts (via
common.txtand transitively vianixl→nixl-cu12),breaking the
cupy.cuda.ExternalStreamand event-recorder paths atimport time.
Depends on #3070. Without that change, the unignored suites will fail at import on AMD.
Special notes for your reviewers:
cupy install issue), we can add narrow
@pytest.mark.skipifmarkersin follow-ups rather than re-adding blanket
--ignoreentries.If applicable:
Note
Low Risk
Low risk CI-only change, but it increases AMD coverage by running previously-skipped test suites and may surface new ROCm-only failures.
Overview
Unifies the Buildkite unit-test
pytestinvocation across GPU types by removing the AMD-only branch and running the same test set on both AMD and NVIDIA runners.This drops the AMD-only
--ignoreentries (notablytests/v1/multiprocessandtests/v1/mp_observability/test_event_recorder.py), increasing coverage for ROCm CI.Reviewed by Cursor Bugbot for commit ac0c80b. Bugbot is set up for automated code reviews on this repo. Configure here.