[data] split test_all_to_all.py#53865
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR splits the monolithic test_all_to_all.py into several smaller end-to-end tests to speed up runtime by running tests in parallel. The changes include new test files for unique, repartition, random shuffle, map groups, and aggregation functionalities as well as updates in the BUILD file to reflect these changes.
- Splitting tests for aggregation, unique, repartition, random shuffle, and group-by operations.
- Updating the BUILD configuration to assign separate test targets.
- Refactoring test logic for improved parallelism and resource-specific behavior.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| python/ray/data/tests/test_unique_e2e.py | New tests for unique operation and null handling. |
| python/ray/data/tests/test_repartition_e2e.py | New tests for repartition logic with and without shuffling. |
| python/ray/data/tests/test_random_e2e.py | New tests for random shuffle operations and order verification. |
| python/ray/data/tests/test_map_groups_e2e.py | New tests for groupby map operations using GPUs and actors. |
| python/ray/data/tests/test_agg_e2e.py | New tests for aggregation logic and error validation. |
| python/ray/data/BUILD | Updated BUILD configuration to reflect split test targets. |
There was a problem hiding this comment.
[nitpick] Consider adding an explicit assert statement here (e.g., assert ds.schema().names == ["a", "b"]) to validate that the schema is as expected.
| ds.schema().names == ["a", "b"] | |
| assert ds.schema().names == ["a", "b"], "Schema does not match the expected structure ['a', 'b']" |
There was a problem hiding this comment.
[nitpick] When using 'len(keys)' inside check_init, consider explicitly checking the type of 'keys' (e.g. whether it is a list or a string) to make the intent clearer and avoid potential confusion.
| def check_init(k): | |
| def check_init(k): | |
| if not isinstance(keys, (list, str)): | |
| raise TypeError(f"'keys' must be a list or a string, but got {type(keys).__name__}") |
Signed-off-by: can <can@anyscale.com>
The test_all_to_all.py is taking a really long time to finish ([35 minutes](https://buildkite.com/ray-project/postmerge/builds/10883/steps/canvas?jid=0197797e-9c25-4030-a1a6-ff89dba44f8e#0197797e-9c25-4030-a1a6-ff89dba44f8e/176-1023)). I'm breaking this into smaller chunks so they run in parallel. Test: CI Signed-off-by: can <can@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
The test_all_to_all.py is taking a really long time to finish ([35 minutes](https://buildkite.com/ray-project/postmerge/builds/10883/steps/canvas?jid=0197797e-9c25-4030-a1a6-ff89dba44f8e#0197797e-9c25-4030-a1a6-ff89dba44f8e/176-1023)). I'm breaking this into smaller chunks so they run in parallel. Test: CI Signed-off-by: can <can@anyscale.com>
The test_all_to_all.py is taking a really long time to finish ([35 minutes](https://buildkite.com/ray-project/postmerge/builds/10883/steps/canvas?jid=0197797e-9c25-4030-a1a6-ff89dba44f8e#0197797e-9c25-4030-a1a6-ff89dba44f8e/176-1023)). I'm breaking this into smaller chunks so they run in parallel. Test: CI Signed-off-by: can <can@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
The test_all_to_all.py is taking a really long time to finish (35 minutes). I'm breaking this into smaller chunks so they run in parallel.
Test:
CI