-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Description
Description
Reduce CI time for Data-only PRs by not running unrelated ML tests. Currently, when files in python/ray/data/ are modified, the CI triggers all ML/train tests.
Background
Ray's CI uses a tag-based system to determine which tests run for different code changes:
-
test_rules.txt (
ci/pipeline/test_rules.txt) maps file paths to CI tags. Currently, changes topython/ray/data/trigger bothdata, andmlandtraintags -
CI YAML files define test steps that run based on these tags. For example, the
ml.rayci.ymlfile contains a:train: ml: train v1 testsstep that's run whenever thetraintag is triggered. -
Bazel BUILD files define individual test targets with tags like
team:ml,train_v2, etc. Bazel uses--only-tagsand--except-tagsflags to filter which tests execute.
The problem is that Data PRs currently trigger ALL train tests, when we only actually care about these four tests:
- test_data_integration
- test_data_resource_cleanup
- test_data_config
- test_iter_torch_batches_gpu
Implementation Boundaries & Constraints
Target Files
-
ci/pipeline/test_rules.txt- Remove thetrainandmltag from Data file changes- Current (line 35-43):
python/ray/data/ .buildkite/data.rayci.yml ci/docker/data.build.Dockerfile ... @ data ml train ; - Change to:
@ data(removetrainandml)
- Current (line 35-43):
-
.buildkite/ml.rayci.yml- Add a new test step to run only the 4 Data-related ML tests- Pattern match against
:train: ml: train v1 gpu testsand add a new step that runs with--only-tags data_integration(new tag to be added)
- Pattern match against
-
In the
BUILD.bazelfiles - Add a new tagdata_integrationto the 4 relevant test targets:
test_data_integrationtest_data_resource_cleanuptest_data_configtest_iter_torch_batches_gpu
.buildkite/ml.rayci.yml- Add--except-tags data_integrationto the other train test steps to avoid duplicate test runs
Prototype implementation
See PR #59690 for a similar approach.
Do Not Touch
- Individual test file implementations (
test_data_integration.py, etc.) - Core CI infrastructure (
ci/ray_ci/tester.py) - Docker build configurations
Contributing Expectations
Please review the Ray Data contributing guide before starting: https://docs.ray.io/en/latest/data/contributing/contributing-guide.html