Skip to content

[Data] Don't run train and ml tests on Data-only PRs#59690

Closed
bveeramani wants to merge 4 commits intomasterfrom
stop-running-ml-tests
Closed

[Data] Don't run train and ml tests on Data-only PRs#59690
bveeramani wants to merge 4 commits intomasterfrom
stop-running-ml-tests

Conversation

@bveeramani
Copy link
Copy Markdown
Member

Thank you for contributing to Ray! 🚀
Please review the Ray Contribution Guide before opening a pull request.

⚠️ Remove these instructions before submitting your PR.

💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete.

Description

Briefly describe what this PR accomplishes and why it's needed.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani requested a review from a team as a code owner December 26, 2025 20:23
@bveeramani bveeramani marked this pull request as draft December 26, 2025 20:23
@bveeramani bveeramani added the go add ONLY when ready to merge, run all tests label Dec 26, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to optimize the CI process by preventing train and ml tests from running on pull requests that only contain changes related to Ray Data. This is achieved by updating the test rules in ci/pipeline/test_rules.txt. To ensure that data-related tests within the train module are still executed, a new Buildkite step is introduced in .buildkite/ml.rayci.yml which specifically runs tests tagged for data. Correspondingly, several tests in python/ray/train/ and python/ray/train/v2/ have been tagged with team:data in their BUILD.bazel files. Additionally, a test has been refactored by moving it to a more appropriate file. The changes are logical and well-executed. I have one suggestion to improve the robustness of the new CI step.

commands:
- bazel run //ci/ray_ci:test_in_docker -- //python/ray/train/... data
--build-name mlgpubuild-py3.10 --python-version 3.10
depends_on: [ "mlgpubuild-multipy", "forge" ]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding soft_fail: true to this new test step. This would prevent potential flakiness in this newly carved-out test suite from blocking data-only PRs. Other similar test steps in this file, like :train: ml: train v2 gpu tests, use soft_fail: true.

    depends_on: [ "mlgpubuild-multipy", "forge" ]
    soft_fail: true

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad advice gemini.. bad advice..

instance_type: gpu-large
commands:
- bazel run //ci/ray_ci:test_in_docker -- //python/ray/train/... data
--build-name mlgpubuild-py3.10 --python-version 3.10
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing tag filter causes all train tests to run

The new buildkite job command is missing --only-tags team:data to filter which tests to run. Without this filter, all tests in //python/ray/train/... will execute when data files change, rather than just the tests tagged with team:data. This defeats the PR's goal of avoiding running all train tests on data-only changes. Other similar commands in this file consistently use --only-tags to filter tests by tag.

Fix in Cursor Fix in Web

bveeramani and others added 3 commits December 26, 2025 15:28
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
…t/ray into stop-running-ml-tests

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
env = {"RAY_TRAIN_V2_ENABLED": "1"},
tags = [
"exclusive",
"team:data",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a test is normally expected to only have one team tag. if it has multiple team tags, I am not sure if something downstream consuming these tests will break.

why we need to add more team tags for these tests?

depends_on: [ "mlgpubuild-multipy", "forge" ]
soft_fail: true

- label: ":train: ml: data tests"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the purpose of this is to have a subset of train tests to be triggered on data changes?

if that is the purpose, here is what needs to be done:

  • for the tests, instead of adding team:data tag, add datatests (or some other meaningful tag for these tests)
    • you can change the team tag if you think that is the right thing to do, and move them to the data team's test group. but since they live in train directory, probably not the right idea to change the team tag
  • add a test job here with --only-tags datatests, with job tags having both data and train in it, meaning these tests will be triggered by both data code changes and train code changes.
  • add datatests into --except-tags that are used in other jobs, so that they will only run once.

@bveeramani
Copy link
Copy Markdown
Member Author

Closing this prototype PR. I've described the task here: #59780

@bveeramani bveeramani closed this Dec 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

2 participants