Skip to content

[train] Fix py312 GPU test filter#57559

Merged
justinvyu merged 1 commit intoray-project:masterfrom
justinvyu:gpu_test_filter
Oct 8, 2025
Merged

[train] Fix py312 GPU test filter#57559
justinvyu merged 1 commit intoray-project:masterfrom
justinvyu:gpu_test_filter

Conversation

@justinvyu
Copy link
Copy Markdown
Contributor

Summary

#56868 replaced gpu tags with train_v2_gpu in order to separate out the tests into 2 CI pipelines. However, there's a py312 test that only runs in postmerge that filters out the gpu tag. This resulted in GPU tests running on the py312 CPU test runner. This PR fixes the issue by filtering out the new train_v2_gpu tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
--python-version {{matrix.python}}
--build-name mlgpubuild-py{{matrix.python}}
--only-tags gpu
--only-tags gpu,train_v2_gpu
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Duplicate GPU Tests Across Jobs

The ml: train v1 gpu tests job now includes train_v2_gpu in its --only-tags. This change causes tests tagged with train_v2_gpu to run in both this job and a dedicated train v2 gpu tests job, resulting in duplicate execution and wasted CI resources.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the py312 job is to duplicate the other tests that usually run in a lower py version.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a CI configuration issue where GPU tests were running on a CPU-only test runner. The changes to include train_v2_gpu in the GPU test suite and exclude it from the CPU test suite are logical and directly address the problem described. I've added a couple of suggestions to potentially remove a now-legacy gpu tag for improved clarity, based on the information in the PR description. Overall, this is a good, targeted fix.

--workers 4 --worker-id {{matrix.worker_id}} --parallelism-per-worker 3
--python-version {{matrix.python}}
--except-tags gpu,minimal,doctest,needs_credentials,soft_imports,rllib
--except-tags gpu,train_v2_gpu,minimal,doctest,needs_credentials,soft_imports,rllib
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Based on the PR description, gpu tags were replaced with train_v2_gpu. If the gpu tag is no longer in use, we could remove it from this list to improve clarity and avoid maintaining a legacy tag. This would make it clear that train_v2_gpu is the new standard for these tests.

If the gpu tag is still in use for some tests, then this change is correct as is. In that case, it might be helpful to clarify in the PR summary that both tags are active.

        --except-tags train_v2_gpu,minimal,doctest,needs_credentials,soft_imports,rllib

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"gpu" tag is still in use for v1 gpu tests.

@justinvyu justinvyu enabled auto-merge (squash) October 8, 2025 18:27
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Oct 8, 2025
@ray-gardener ray-gardener bot added train Ray Train Related Issue devprod labels Oct 8, 2025
@justinvyu justinvyu merged commit 281f96b into ray-project:master Oct 8, 2025
7 checks passed
@justinvyu justinvyu deleted the gpu_test_filter branch October 8, 2025 19:44
liulehui pushed a commit to liulehui/ray that referenced this pull request Oct 9, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
joshkodi pushed a commit to joshkodi/ray that referenced this pull request Oct 13, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Josh Kodi <joshkodi@gmail.com>
ArturNiederfahrenhorst pushed a commit to ArturNiederfahrenhorst/ray that referenced this pull request Oct 13, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 22, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Oct 23, 2025
#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
ray-project#56868 replaced `gpu` tags with
`train_v2_gpu` in order to separate out the tests into 2 CI pipelines.
However, there's a py312 test that only runs in postmerge that filters
out the `gpu` tag. This resulted in GPU tests running on the py312 CPU
test runner. This PR fixes the issue by filtering out the new
`train_v2_gpu` tag as well.

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devprod go add ONLY when ready to merge, run all tests train Ray Train Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants