[Data] Introduce seams to DefaultAutoscaler2 to make it more testable by 400Ping · Pull Request #59933 · ray-project/ray

400Ping · 2026-01-07T15:06:47Z

Description

The DefaultAutoscaler2 implementation needs an AutoscalingCoordinator and a way to get all of the _NodeResourceSpec.

Currently, we can't explicitly inject fake implementations of either dependency. This is problematic because the tests need to assume what the implementation of each dependency looks like and use brittle mocks.

To solve this:

Add the FakeAutoscalingCoordinator implementation to a new fake_autoscaling_coordinator.py module (you can use the code below)
DefaultClusterAutoscalerV2 has two new parameters autoscaling_coordinator: Optional[AutoscalingCoordinator] = None and get_node_counts: Callable[[], Dict[_NodeResourceSpec, int]] = get_node_resource_spec_and_count. If autoscaling_coordinator is None, you can use the default implementation.
Update test_try_scale_up_cluster to use the explicit seams rather than mocks. Where possible, assert against the public interface rather than implementation details

Related issues

Closes #59683

gemini-code-assist

Code Review

This pull request refactors DefaultClusterAutoscalerV2 to support dependency injection for its autoscaling coordinator and node count retrieval logic, making it more testable. It introduces a new FakeAutoscalingCoordinator for testing purposes, which always allocates requested resources. Correspondingly, the tests for DefaultClusterAutoscalerV2 are updated to utilize this new injection mechanism and the fake coordinator, replacing previous mocking. Review comments suggest minor code style improvements, including using the or operator for concise default value assignment in DefaultClusterAutoscalerV2's __init__ method and dict.pop() for more concise item removal in FakeAutoscalingCoordinator.

python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py

python/ray/data/_internal/cluster_autoscaler/fake_autoscaling_coordinator.py

… for testing Signed-off-by: 400Ping <fourhundredping@gmail.com>

Signed-off-by: 400Ping <fourhundredping@gmail.com>

400Ping · 2026-01-09T14:53:03Z

@bveeramani PTAL

python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py

python/ray/data/_internal/cluster_autoscaler/fake_autoscaling_coordinator.py

python/ray/data/tests/test_default_cluster_autoscaler_v2.py

Signed-off-by: 400Ping <fourhundredping@gmail.com>

python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py

Signed-off-by: 400Ping <fourhundredping@gmail.com>

bveeramani

LGTM. ty for the contribution

I'm going to wait for #59896 to land, and then I'll merge this

bveeramani · 2026-01-13T03:49:02Z

@400Ping looks like there are a couple of merge conflicts. Would you mind resolving them, and then I'll merge?

Signed-off-by: Ping <fourhundredping@gmail.com>

python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py

Signed-off-by: 400Ping <fourhundredping@gmail.com>

python/ray/data/tests/test_default_cluster_autoscaler_v2.py

python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py

Signed-off-by: 400Ping <fourhundredping@gmail.com>

bveeramani

One last round of feedback. I think we're good to land after

python/ray/data/tests/test_default_cluster_autoscaler_v2.py

Signed-off-by: 400Ping <fourhundredping@gmail.com>

400Ping · 2026-01-13T09:21:29Z

cc @bveeramani for final look.

bveeramani

Nice

…ray-project#59933) ## Description The `DefaultAutoscaler2` implementation needs an `AutoscalingCoordinator` and a way to get all of the `_NodeResourceSpec`. Currently, we can't explicitly inject fake implementations of either dependency. This is problematic because the tests need to assume what the implementation of each dependency looks like and use brittle mocks. To solve this: - Add the `FakeAutoscalingCoordinator` implementation to a new `fake_autoscaling_coordinator.py` module (you can use the code below) - `DefaultClusterAutoscalerV2` has two new parameters `autoscaling_coordinator: Optional[AutoscalingCoordinator] = None` and `get_node_counts: Callable[[], Dict[_NodeResourceSpec, int]] = get_node_resource_spec_and_count`. If `autoscaling_coordinator` is None, you can use the default implementation. - Update `test_try_scale_up_cluster` to use the explicit seams rather than mocks. Where possible, assert against the public interface rather than implementation details ## Related issues Closes ray-project#59683 --------- Signed-off-by: 400Ping <fourhundredping@gmail.com> Signed-off-by: Ping <fourhundredping@gmail.com>

…ray-project#59933) ## Description The `DefaultAutoscaler2` implementation needs an `AutoscalingCoordinator` and a way to get all of the `_NodeResourceSpec`. Currently, we can't explicitly inject fake implementations of either dependency. This is problematic because the tests need to assume what the implementation of each dependency looks like and use brittle mocks. To solve this: - Add the `FakeAutoscalingCoordinator` implementation to a new `fake_autoscaling_coordinator.py` module (you can use the code below) - `DefaultClusterAutoscalerV2` has two new parameters `autoscaling_coordinator: Optional[AutoscalingCoordinator] = None` and `get_node_counts: Callable[[], Dict[_NodeResourceSpec, int]] = get_node_resource_spec_and_count`. If `autoscaling_coordinator` is None, you can use the default implementation. - Update `test_try_scale_up_cluster` to use the explicit seams rather than mocks. Where possible, assert against the public interface rather than implementation details ## Related issues Closes ray-project#59683 --------- Signed-off-by: 400Ping <fourhundredping@gmail.com> Signed-off-by: Ping <fourhundredping@gmail.com> Signed-off-by: jeffery4011 <jefferyshen1015@gmail.com>

…ray-project#59933) ## Description The `DefaultAutoscaler2` implementation needs an `AutoscalingCoordinator` and a way to get all of the `_NodeResourceSpec`. Currently, we can't explicitly inject fake implementations of either dependency. This is problematic because the tests need to assume what the implementation of each dependency looks like and use brittle mocks. To solve this: - Add the `FakeAutoscalingCoordinator` implementation to a new `fake_autoscaling_coordinator.py` module (you can use the code below) - `DefaultClusterAutoscalerV2` has two new parameters `autoscaling_coordinator: Optional[AutoscalingCoordinator] = None` and `get_node_counts: Callable[[], Dict[_NodeResourceSpec, int]] = get_node_resource_spec_and_count`. If `autoscaling_coordinator` is None, you can use the default implementation. - Update `test_try_scale_up_cluster` to use the explicit seams rather than mocks. Where possible, assert against the public interface rather than implementation details ## Related issues Closes ray-project#59683 --------- Signed-off-by: 400Ping <fourhundredping@gmail.com> Signed-off-by: Ping <fourhundredping@gmail.com>

…ray-project#59933) ## Description The `DefaultAutoscaler2` implementation needs an `AutoscalingCoordinator` and a way to get all of the `_NodeResourceSpec`. Currently, we can't explicitly inject fake implementations of either dependency. This is problematic because the tests need to assume what the implementation of each dependency looks like and use brittle mocks. To solve this: - Add the `FakeAutoscalingCoordinator` implementation to a new `fake_autoscaling_coordinator.py` module (you can use the code below) - `DefaultClusterAutoscalerV2` has two new parameters `autoscaling_coordinator: Optional[AutoscalingCoordinator] = None` and `get_node_counts: Callable[[], Dict[_NodeResourceSpec, int]] = get_node_resource_spec_and_count`. If `autoscaling_coordinator` is None, you can use the default implementation. - Update `test_try_scale_up_cluster` to use the explicit seams rather than mocks. Where possible, assert against the public interface rather than implementation details ## Related issues Closes ray-project#59683 --------- Signed-off-by: 400Ping <fourhundredping@gmail.com> Signed-off-by: Ping <fourhundredping@gmail.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

400Ping requested a review from a team as a code owner January 7, 2026 15:06

400Ping marked this pull request as draft January 7, 2026 15:07

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py Show resolved Hide resolved

python/ray/data/_internal/cluster_autoscaler/fake_autoscaling_coordinator.py Show resolved Hide resolved

400Ping marked this pull request as ready for review January 7, 2026 15:43

ray-gardener bot added core Issues that should be addressed in Ray Core data Ray Data-related issues community-contribution Contributed by the community labels Jan 7, 2026

bveeramani mentioned this pull request Jan 9, 2026

[Data] DefaultAutoscalerV2 doesn't scale nodes from zero #59896

Merged

400Ping added 2 commits January 9, 2026 22:48

[Data] inject autoscaling coordinator into DefaultClusterAutoscalerV2…

1b07dc4

… for testing Signed-off-by: 400Ping <fourhundredping@gmail.com>

chore: retrigger ci

57efe34

Signed-off-by: 400Ping <fourhundredping@gmail.com>

400Ping force-pushed the data/AutoscalingCoordinator branch from 91d81c5 to 57efe34 Compare January 9, 2026 14:48

machichima reviewed Jan 10, 2026

View reviewed changes

400Ping added 2 commits January 11, 2026 17:20

update

60c0713

Signed-off-by: 400Ping <fourhundredping@gmail.com>

update

905e05c

Signed-off-by: 400Ping <fourhundredping@gmail.com>

400Ping requested a review from machichima January 11, 2026 16:21

machichima reviewed Jan 11, 2026

View reviewed changes

python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py Outdated Show resolved Hide resolved

python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py Outdated Show resolved Hide resolved

update

b8940db

Signed-off-by: 400Ping <fourhundredping@gmail.com>

400Ping requested a review from machichima January 12, 2026 09:54

bveeramani approved these changes Jan 12, 2026

View reviewed changes

Merge branch 'master' into data/AutoscalingCoordinator

f2aa432

Signed-off-by: Ping <fourhundredping@gmail.com>

cursor bot reviewed Jan 13, 2026

View reviewed changes

python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py Show resolved Hide resolved

fix

dde34dc

Signed-off-by: 400Ping <fourhundredping@gmail.com>

bveeramani reviewed Jan 13, 2026

View reviewed changes

python/ray/data/tests/test_default_cluster_autoscaler_v2.py Show resolved Hide resolved

python/ray/data/_internal/cluster_autoscaler/default_cluster_autoscaler_v2.py Outdated Show resolved Hide resolved

update

8e8d4be

Signed-off-by: 400Ping <fourhundredping@gmail.com>

bveeramani approved these changes Jan 13, 2026

View reviewed changes

python/ray/data/tests/test_default_cluster_autoscaler_v2.py Outdated Show resolved Hide resolved

python/ray/data/tests/test_default_cluster_autoscaler_v2.py Outdated Show resolved Hide resolved

update

b22d174

Signed-off-by: 400Ping <fourhundredping@gmail.com>

bveeramani approved these changes Jan 13, 2026

View reviewed changes

bveeramani enabled auto-merge (squash) January 13, 2026 09:23

github-actions bot added the go add ONLY when ready to merge, run all tests label Jan 13, 2026

bveeramani merged commit 114bda3 into ray-project:master Jan 13, 2026
8 checks passed

Conversation

400Ping commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

400Ping commented Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bveeramani left a comment

Choose a reason for hiding this comment

Uh oh!

bveeramani commented Jan 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bveeramani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

400Ping commented Jan 13, 2026

Uh oh!

bveeramani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

400Ping commented Jan 7, 2026 •

edited

Loading