[Data] Remove unstable image_classification_chaos_no_scale_back release test#58048
Merged
bveeramani merged 1 commit intomasterfrom Oct 23, 2025
Merged
[Data] Remove unstable image_classification_chaos_no_scale_back release test#58048bveeramani merged 1 commit intomasterfrom
image_classification_chaos_no_scale_back release test#58048bveeramani merged 1 commit intomasterfrom
Conversation
…se test This test became broken after the removal of Parquet metadata fetching tasks in #56105. The test relies on specific autoscaling behavior that no longer works as expected without metadata fetching. Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
image_classification_chaos_no_scale_back release test
Contributor
There was a problem hiding this comment.
Code Review
This pull request removes the unstable image_classification_chaos_no_scale_back release test and its associated setup script. The justification for this removal is well-explained in the pull request description, citing that the test is broken due to changes in autoscaling behavior and is no longer providing a useful signal. The changes are straightforward and correctly implement the removal. I've added one comment on the deleted script regarding a hardcoded URL as a note for future reference.
iamjustinhsu
approved these changes
Oct 23, 2025
xinyuangui2
pushed a commit
to xinyuangui2/ray
that referenced
this pull request
Oct 27, 2025
…ease test (ray-project#58048) ## Summary This PR removes the `image_classification_chaos_no_scale_back` release test and its associated setup script (`setup_cluster_compute_config_updater.py`). This test has become non-functional and is no longer providing useful signal. ## Background The `image_classification_chaos_no_scale_back` release test was designed to validate Ray Data's fault tolerance when many nodes abruptly get preempted at the same time. The test worked by: 1. Running on an autoscaling cluster with 1-10 nodes 2. Updating the compute config mid-test to downscale to 5 nodes 3. Asserting that there are dead nodes as a sanity check ## Why This Test Is Broken After the removal of Parquet metadata fetching in ray-project#56105 (September 2, 2025), the autoscaling behavior changed significantly: - **Before metadata removal**: The cluster would autoscale more aggressively because metadata fetching created additional tasks that triggered faster scale-up. The cluster would scale past 5 nodes, then downscale, leaving dead nodes that the test could detect. - **After metadata removal**: Without the metadata fetching tasks, the cluster doesn't scale up fast enough to get past 5 nodes before the downscale happens. This means there are no dead nodes to detect, causing the test to fail. ## Why We're Removing It 1. **Test is fundamentally broken**: The test's assumptions about autoscaling behavior are no longer valid after the metadata fetching removal 2. **Not actively monitored**: The test is marked as unstable and isn't closely watched ## Changes - Removed `image_classification_chaos_no_scale_back` test from `release/release_data_tests.yaml` - Deleted `release/nightly_tests/setup_cluster_compute_config_updater.py` (only used by this test) ## Related See ray-project#56105 Fixes ray-project#56528 Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: xgui <xgui@anyscale.com>
landscapepainter
pushed a commit
to landscapepainter/ray
that referenced
this pull request
Nov 17, 2025
…ease test (ray-project#58048) ## Summary This PR removes the `image_classification_chaos_no_scale_back` release test and its associated setup script (`setup_cluster_compute_config_updater.py`). This test has become non-functional and is no longer providing useful signal. ## Background The `image_classification_chaos_no_scale_back` release test was designed to validate Ray Data's fault tolerance when many nodes abruptly get preempted at the same time. The test worked by: 1. Running on an autoscaling cluster with 1-10 nodes 2. Updating the compute config mid-test to downscale to 5 nodes 3. Asserting that there are dead nodes as a sanity check ## Why This Test Is Broken After the removal of Parquet metadata fetching in ray-project#56105 (September 2, 2025), the autoscaling behavior changed significantly: - **Before metadata removal**: The cluster would autoscale more aggressively because metadata fetching created additional tasks that triggered faster scale-up. The cluster would scale past 5 nodes, then downscale, leaving dead nodes that the test could detect. - **After metadata removal**: Without the metadata fetching tasks, the cluster doesn't scale up fast enough to get past 5 nodes before the downscale happens. This means there are no dead nodes to detect, causing the test to fail. ## Why We're Removing It 1. **Test is fundamentally broken**: The test's assumptions about autoscaling behavior are no longer valid after the metadata fetching removal 2. **Not actively monitored**: The test is marked as unstable and isn't closely watched ## Changes - Removed `image_classification_chaos_no_scale_back` test from `release/release_data_tests.yaml` - Deleted `release/nightly_tests/setup_cluster_compute_config_updater.py` (only used by this test) ## Related See ray-project#56105 Fixes ray-project#56528 Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Aydin-ab
pushed a commit
to Aydin-ab/ray-aydin
that referenced
this pull request
Nov 19, 2025
…ease test (ray-project#58048) ## Summary This PR removes the `image_classification_chaos_no_scale_back` release test and its associated setup script (`setup_cluster_compute_config_updater.py`). This test has become non-functional and is no longer providing useful signal. ## Background The `image_classification_chaos_no_scale_back` release test was designed to validate Ray Data's fault tolerance when many nodes abruptly get preempted at the same time. The test worked by: 1. Running on an autoscaling cluster with 1-10 nodes 2. Updating the compute config mid-test to downscale to 5 nodes 3. Asserting that there are dead nodes as a sanity check ## Why This Test Is Broken After the removal of Parquet metadata fetching in ray-project#56105 (September 2, 2025), the autoscaling behavior changed significantly: - **Before metadata removal**: The cluster would autoscale more aggressively because metadata fetching created additional tasks that triggered faster scale-up. The cluster would scale past 5 nodes, then downscale, leaving dead nodes that the test could detect. - **After metadata removal**: Without the metadata fetching tasks, the cluster doesn't scale up fast enough to get past 5 nodes before the downscale happens. This means there are no dead nodes to detect, causing the test to fail. ## Why We're Removing It 1. **Test is fundamentally broken**: The test's assumptions about autoscaling behavior are no longer valid after the metadata fetching removal 2. **Not actively monitored**: The test is marked as unstable and isn't closely watched ## Changes - Removed `image_classification_chaos_no_scale_back` test from `release/release_data_tests.yaml` - Deleted `release/nightly_tests/setup_cluster_compute_config_updater.py` (only used by this test) ## Related See ray-project#56105 Fixes ray-project#56528 Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier
pushed a commit
to Future-Outlier/ray
that referenced
this pull request
Dec 7, 2025
…ease test (ray-project#58048) ## Summary This PR removes the `image_classification_chaos_no_scale_back` release test and its associated setup script (`setup_cluster_compute_config_updater.py`). This test has become non-functional and is no longer providing useful signal. ## Background The `image_classification_chaos_no_scale_back` release test was designed to validate Ray Data's fault tolerance when many nodes abruptly get preempted at the same time. The test worked by: 1. Running on an autoscaling cluster with 1-10 nodes 2. Updating the compute config mid-test to downscale to 5 nodes 3. Asserting that there are dead nodes as a sanity check ## Why This Test Is Broken After the removal of Parquet metadata fetching in ray-project#56105 (September 2, 2025), the autoscaling behavior changed significantly: - **Before metadata removal**: The cluster would autoscale more aggressively because metadata fetching created additional tasks that triggered faster scale-up. The cluster would scale past 5 nodes, then downscale, leaving dead nodes that the test could detect. - **After metadata removal**: Without the metadata fetching tasks, the cluster doesn't scale up fast enough to get past 5 nodes before the downscale happens. This means there are no dead nodes to detect, causing the test to fail. ## Why We're Removing It 1. **Test is fundamentally broken**: The test's assumptions about autoscaling behavior are no longer valid after the metadata fetching removal 2. **Not actively monitored**: The test is marked as unstable and isn't closely watched ## Changes - Removed `image_classification_chaos_no_scale_back` test from `release/release_data_tests.yaml` - Deleted `release/nightly_tests/setup_cluster_compute_config_updater.py` (only used by this test) ## Related See ray-project#56105 Fixes ray-project#56528 Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Blaze-DSP
pushed a commit
to Blaze-DSP/ray
that referenced
this pull request
Dec 18, 2025
…ease test (ray-project#58048) ## Summary This PR removes the `image_classification_chaos_no_scale_back` release test and its associated setup script (`setup_cluster_compute_config_updater.py`). This test has become non-functional and is no longer providing useful signal. ## Background The `image_classification_chaos_no_scale_back` release test was designed to validate Ray Data's fault tolerance when many nodes abruptly get preempted at the same time. The test worked by: 1. Running on an autoscaling cluster with 1-10 nodes 2. Updating the compute config mid-test to downscale to 5 nodes 3. Asserting that there are dead nodes as a sanity check ## Why This Test Is Broken After the removal of Parquet metadata fetching in ray-project#56105 (September 2, 2025), the autoscaling behavior changed significantly: - **Before metadata removal**: The cluster would autoscale more aggressively because metadata fetching created additional tasks that triggered faster scale-up. The cluster would scale past 5 nodes, then downscale, leaving dead nodes that the test could detect. - **After metadata removal**: Without the metadata fetching tasks, the cluster doesn't scale up fast enough to get past 5 nodes before the downscale happens. This means there are no dead nodes to detect, causing the test to fail. ## Why We're Removing It 1. **Test is fundamentally broken**: The test's assumptions about autoscaling behavior are no longer valid after the metadata fetching removal 2. **Not actively monitored**: The test is marked as unstable and isn't closely watched ## Changes - Removed `image_classification_chaos_no_scale_back` test from `release/release_data_tests.yaml` - Deleted `release/nightly_tests/setup_cluster_compute_config_updater.py` (only used by this test) ## Related See ray-project#56105 Fixes ray-project#56528 Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
peterxcli
pushed a commit
to peterxcli/ray
that referenced
this pull request
Feb 25, 2026
…ease test (ray-project#58048) ## Summary This PR removes the `image_classification_chaos_no_scale_back` release test and its associated setup script (`setup_cluster_compute_config_updater.py`). This test has become non-functional and is no longer providing useful signal. ## Background The `image_classification_chaos_no_scale_back` release test was designed to validate Ray Data's fault tolerance when many nodes abruptly get preempted at the same time. The test worked by: 1. Running on an autoscaling cluster with 1-10 nodes 2. Updating the compute config mid-test to downscale to 5 nodes 3. Asserting that there are dead nodes as a sanity check ## Why This Test Is Broken After the removal of Parquet metadata fetching in ray-project#56105 (September 2, 2025), the autoscaling behavior changed significantly: - **Before metadata removal**: The cluster would autoscale more aggressively because metadata fetching created additional tasks that triggered faster scale-up. The cluster would scale past 5 nodes, then downscale, leaving dead nodes that the test could detect. - **After metadata removal**: Without the metadata fetching tasks, the cluster doesn't scale up fast enough to get past 5 nodes before the downscale happens. This means there are no dead nodes to detect, causing the test to fail. ## Why We're Removing It 1. **Test is fundamentally broken**: The test's assumptions about autoscaling behavior are no longer valid after the metadata fetching removal 2. **Not actively monitored**: The test is marked as unstable and isn't closely watched ## Changes - Removed `image_classification_chaos_no_scale_back` test from `release/release_data_tests.yaml` - Deleted `release/nightly_tests/setup_cluster_compute_config_updater.py` (only used by this test) ## Related See ray-project#56105 Fixes ray-project#56528 Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: peterxcli <peterxcli@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR removes the
image_classification_chaos_no_scale_backrelease test and its associated setup script (setup_cluster_compute_config_updater.py). This test has become non-functional and is no longer providing useful signal.Background
The
image_classification_chaos_no_scale_backrelease test was designed to validate Ray Data's fault tolerance when many nodes abruptly get preempted at the same time.The test worked by:
Why This Test Is Broken
After the removal of Parquet metadata fetching in #56105 (September 2, 2025), the autoscaling behavior changed significantly:
Before metadata removal: The cluster would autoscale more aggressively because metadata fetching created additional tasks that triggered faster scale-up. The cluster would scale past 5 nodes, then downscale, leaving dead nodes that the test could detect.
After metadata removal: Without the metadata fetching tasks, the cluster doesn't scale up fast enough to get past 5 nodes before the downscale happens. This means there are no dead nodes to detect, causing the test to fail.
Why We're Removing It
Changes
image_classification_chaos_no_scale_backtest fromrelease/release_data_tests.yamlrelease/nightly_tests/setup_cluster_compute_config_updater.py(only used by this test)Related
See #56105
Fixes #56528