[Data] Don't run train and ml tests on Data-only PRs#59690
[Data] Don't run train and ml tests on Data-only PRs#59690bveeramani wants to merge 4 commits intomasterfrom
train and ml tests on Data-only PRs#59690Conversation
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
There was a problem hiding this comment.
Code Review
This pull request aims to optimize the CI process by preventing train and ml tests from running on pull requests that only contain changes related to Ray Data. This is achieved by updating the test rules in ci/pipeline/test_rules.txt. To ensure that data-related tests within the train module are still executed, a new Buildkite step is introduced in .buildkite/ml.rayci.yml which specifically runs tests tagged for data. Correspondingly, several tests in python/ray/train/ and python/ray/train/v2/ have been tagged with team:data in their BUILD.bazel files. Additionally, a test has been refactored by moving it to a more appropriate file. The changes are logical and well-executed. I have one suggestion to improve the robustness of the new CI step.
| commands: | ||
| - bazel run //ci/ray_ci:test_in_docker -- //python/ray/train/... data | ||
| --build-name mlgpubuild-py3.10 --python-version 3.10 | ||
| depends_on: [ "mlgpubuild-multipy", "forge" ] |
There was a problem hiding this comment.
Consider adding soft_fail: true to this new test step. This would prevent potential flakiness in this newly carved-out test suite from blocking data-only PRs. Other similar test steps in this file, like :train: ml: train v2 gpu tests, use soft_fail: true.
depends_on: [ "mlgpubuild-multipy", "forge" ]
soft_fail: trueThere was a problem hiding this comment.
bad advice gemini.. bad advice..
| instance_type: gpu-large | ||
| commands: | ||
| - bazel run //ci/ray_ci:test_in_docker -- //python/ray/train/... data | ||
| --build-name mlgpubuild-py3.10 --python-version 3.10 |
There was a problem hiding this comment.
Missing tag filter causes all train tests to run
The new buildkite job command is missing --only-tags team:data to filter which tests to run. Without this filter, all tests in //python/ray/train/... will execute when data files change, rather than just the tests tagged with team:data. This defeats the PR's goal of avoiding running all train tests on data-only changes. Other similar commands in this file consistently use --only-tags to filter tests by tag.
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
…t/ray into stop-running-ml-tests Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
| env = {"RAY_TRAIN_V2_ENABLED": "1"}, | ||
| tags = [ | ||
| "exclusive", | ||
| "team:data", |
There was a problem hiding this comment.
a test is normally expected to only have one team tag. if it has multiple team tags, I am not sure if something downstream consuming these tests will break.
why we need to add more team tags for these tests?
| depends_on: [ "mlgpubuild-multipy", "forge" ] | ||
| soft_fail: true | ||
|
|
||
| - label: ":train: ml: data tests" |
There was a problem hiding this comment.
I suppose the purpose of this is to have a subset of train tests to be triggered on data changes?
if that is the purpose, here is what needs to be done:
- for the tests, instead of adding
team:datatag, adddatatests(or some other meaningful tag for these tests)- you can change the team tag if you think that is the right thing to do, and move them to the data team's test group. but since they live in train directory, probably not the right idea to change the team tag
- add a test job here with
--only-tags datatests, with jobtagshaving bothdataandtrainin it, meaning these tests will be triggered by bothdatacode changes andtraincode changes. - add
datatestsinto--except-tagsthat are used in other jobs, so that they will only run once.
|
Closing this prototype PR. I've described the task here: #59780 |
Description
Related issues
Additional information