Skip to content

[Data][CI] Stop running all ML tests on Data premerge #59780

@bveeramani

Description

@bveeramani

Description

Reduce CI time for Data-only PRs by not running unrelated ML tests. Currently, when files in python/ray/data/ are modified, the CI triggers all ML/train tests.

Background

Ray's CI uses a tag-based system to determine which tests run for different code changes:

  1. test_rules.txt (ci/pipeline/test_rules.txt) maps file paths to CI tags. Currently, changes to python/ray/data/ trigger both data, and ml and train tags

  2. CI YAML files define test steps that run based on these tags. For example, the ml.rayci.yml file contains a :train: ml: train v1 tests step that's run whenever the train tag is triggered.

  3. Bazel BUILD files define individual test targets with tags like team:ml, train_v2, etc. Bazel uses --only-tags and --except-tags flags to filter which tests execute.

The problem is that Data PRs currently trigger ALL train tests, when we only actually care about these four tests:

  • test_data_integration
  • test_data_resource_cleanup
  • test_data_config
  • test_iter_torch_batches_gpu

Implementation Boundaries & Constraints

Target Files

  1. ci/pipeline/test_rules.txt - Remove the train and ml tag from Data file changes

    • Current (line 35-43):
      python/ray/data/
      .buildkite/data.rayci.yml
      ci/docker/data.build.Dockerfile
      ...
      @ data ml train
      ;
      
    • Change to: @ data (remove train and ml)
  2. .buildkite/ml.rayci.yml - Add a new test step to run only the 4 Data-related ML tests

    • Pattern match against :train: ml: train v1 gpu tests and add a new step that runs with --only-tags data_integration (new tag to be added)
  3. In the BUILD.bazel files - Add a new tag data_integration to the 4 relevant test targets:

  • test_data_integration
  • test_data_resource_cleanup
  • test_data_config
  • test_iter_torch_batches_gpu
  1. .buildkite/ml.rayci.yml - Add --except-tags data_integration to the other train test steps to avoid duplicate test runs

Prototype implementation

See PR #59690 for a similar approach.

Do Not Touch

  • Individual test file implementations (test_data_integration.py, etc.)
  • Core CI infrastructure (ci/ray_ci/tester.py)
  • Docker build configurations

Contributing Expectations

Please review the Ray Data contributing guide before starting: https://docs.ray.io/en/latest/data/contributing/contributing-guide.html

Metadata

Metadata

Labels

cidataRay Data-related issuesperformancetech-debtThe issue that's due to tech debt

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions