Skip to content

[CI] Run the same test set on AMD as on NVIDIA#3071

Merged
sammshen merged 3 commits intoLMCache:devfrom
Shaoting-Feng:rocm-unignore-amd-tests
Apr 20, 2026
Merged

[CI] Run the same test set on AMD as on NVIDIA#3071
sammshen merged 3 commits intoLMCache:devfrom
Shaoting-Feng:rocm-unignore-amd-tests

Conversation

@Shaoting-Feng
Copy link
Copy Markdown
Contributor

@Shaoting-Feng Shaoting-Feng commented Apr 17, 2026

What this PR does / why we need it:

Collapses the two pytest branches in .buildkite/pipeline.yml so AMD
runs the same test set as NVIDIA. Drops the two AMD-only --ignore
entries:

  • tests/v1/multiprocess
  • tests/v1/mp_observability/test_event_recorder.py

Those suites were skipped because cupy-cuda12x ended up installed on
ROCm hosts (via common.txt and transitively via nixlnixl-cu12),
breaking the cupy.cuda.ExternalStream and event-recorder paths at
import time.

Depends on #3070. Without that change, the unignored suites will fail at import on AMD.

Special notes for your reviewers:

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Note

Low Risk
Low risk CI-only change, but it increases AMD coverage by running previously-skipped test suites and may surface new ROCm-only failures.

Overview
Unifies the Buildkite unit-test pytest invocation across GPU types by removing the AMD-only branch and running the same test set on both AMD and NVIDIA runners.

This drops the AMD-only --ignore entries (notably tests/v1/multiprocess and tests/v1/mp_observability/test_event_recorder.py), increasing coverage for ROCm CI.

Reviewed by Cursor Bugbot for commit ac0c80b. Bugbot is set up for automated code reviews on this repo. Configure here.

Shaoting-Feng and others added 2 commits April 17, 2026 20:38
Moves GPU-vendor-specific runtime deps out of common.txt into
requirements/cuda_core.txt and requirements/rocm_core.txt. setup.py
reads common.txt plus whichever core file matches BUILD_WITH_HIP so
`pip install -e .` Just Works on both CUDA and ROCm hosts.

- Drop cupy-cuda12x and nixl from common.txt (both are CUDA-only on PyPI;
  the nixl meta-package unconditionally pulls nixl-cu12, which installs
  nixl_ep/ and breaks ROCm runtime).
- cuda.txt now -r cuda_core.txt so Dockerfile's `pip install -r cuda.txt`
  still pulls the same set.
- Remove the [tool.setuptools.dynamic] dependencies block from
  pyproject.toml; install_requires is driven by setup.py now.
- Add a second "Without vLLM docker base image" subsection to the
  ROCm install docs, mirroring the CUDA from-source flow line-for-line
  (uv venv -> -r build.txt -> torch from ROCm wheel index -> build).
  The existing rocm/vllm-dev flow stays as-is.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Shaoting Feng <shaotingf@uchicago.edu>
The AMD pytest invocation was ignoring tests/v1/multiprocess and
tests/v1/mp_observability/test_event_recorder.py on top of the
common --ignore set. Those were skipped because cupy-cuda12x ended
up installed on ROCm hosts (via common.txt and via nixl->nixl-cu12),
which broke the cupy.cuda.ExternalStream / event-recorder paths at
import time.

With #<install-pr> merged, BUILD_WITH_HIP=1 pulls cupy-rocm-7-0 and
omits nixl, so those suites can run on AMD. Collapse the two pytest
branches in pipeline.yml into one identical invocation.

Depends on: #<install-pr>  (without it, the unignored suites will
fail at import on AMD because cupy-cuda12x is not ROCm-compatible.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Shaoting Feng <shaotingf@uchicago.edu>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the dependency management system to dynamically handle CUDA and ROCm requirements in setup.py using vendor-specific files. It also updates the ROCm installation documentation for bare hosts and simplifies the Buildkite CI pipeline by consolidating test commands. I have no feedback to provide.

@Shaoting-Feng Shaoting-Feng added the full Run comprehensive tests on this PR label Apr 18, 2026
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sammshen sammshen merged commit 853faf3 into LMCache:dev Apr 20, 2026
55 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants