Skip to content

ray-llm container cu124 -> cu128 update#53730

Merged
aslonnie merged 11 commits intoray-project:masterfrom
eicherseiji:ray-llm-cu128
Jun 13, 2025
Merged

ray-llm container cu124 -> cu128 update#53730
aslonnie merged 11 commits intoray-project:masterfrom
eicherseiji:ray-llm-cu128

Conversation

@eicherseiji
Copy link
Copy Markdown
Contributor

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@eicherseiji eicherseiji marked this pull request as ready for review June 12, 2025 23:46
Copilot AI review requested due to automatic review settings June 12, 2025 23:46
@eicherseiji eicherseiji requested a review from a team as a code owner June 12, 2025 23:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the ray-llm container to support CUDA 12.8 (cu128) instead of CUDA 12.4 by revising the requirements files, Dockerfiles, and CI configuration files.

  • Updated the autogenerated requirements files and associated command comments to reflect the new cu128 dependency indexes.
  • Modified the Dockerfile and CI build configurations to use CUDA 12.8 images and parameters instead of cu124.
  • Adjusted the compile_llm_requirements.sh script loop to iterate over the updated CUDA codes and removed the previous --find-links argument.

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated no comments.

File Description
python/requirements_* Changed comments and command parameters from cu124 to cu128 for CUDA image consistency.
docker/ray-llm/Dockerfile Updated the CUDA version comment and variable to cu128.
ci/* and .buildkite/* files Adjusted CI build args and image references to use CUDA 12.8.
ci/compile_llm_requirements.sh Updated CUDA codes in loop and removed the --find-links parameter for cu124.
Comments suppressed due to low confidence (1)

ci/compile_llm_requirements.sh:25

  • Please verify that removing the '--find-links' parameter for the updated CUDA configurations is intentional and that dependency resolution for cu128 will work correctly using only the extra-index-url.
        --find-links "https://data.pyg.org/whl/torch-2.5.1+${CUDA_CODE}.html"

@eicherseiji eicherseiji added the go add ONLY when ready to merge, run all tests label Jun 13, 2025
@eicherseiji eicherseiji self-assigned this Jun 13, 2025
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Copy link
Copy Markdown
Collaborator

@aslonnie aslonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ci is not passing. seems that some additional chnages are required.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@aslonnie aslonnie merged commit 569b7ea into ray-project:master Jun 13, 2025
5 checks passed
elliot-barn pushed a commit that referenced this pull request Jun 18, 2025
upgrade to use cuda128 base images for both LLM CI test, release tests, and also for building ray-llm images.

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Jul 2, 2025
upgrade to use cuda128 base images for both LLM CI test, release tests, and also for building ray-llm images.

---------

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants